'?' Vs '+' in 'Investigate Stage' Vs 'Standardize Stage'

Rahul Bharadwaj · Post by **Rahul Bharadwaj** » Fri Apr 06, 2012 2:15 am

Hi Every one

We are trying to investigate and standardize names.

In the 'Investigate Stage' (word investigate using rule-set ABC.set) for the name 'GULSAN A GROWER' the pattern genarated is '?I?'

In the 'Standardize Stage' (using the same rule set ABC.set) for the same name , the pattern generated is '+I+'

Though logically both patterns are same, but why is this difference in patterns ('?I?' Vs '+I+')

Thanks and Regards
Rahul

rjdickson · Post by **rjdickson** » Fri Apr 06, 2012 2:43 pm

Yes, this is an expected result. ? in investigation is the same as a + in Standardization.

Example 1:
Input: GULSAN A GROWER
Investigation Results: ?I?
Standardization Input Pattern: +I+

Example 2: (the first word is repeated twice)
Input: GULSAN GUSLAN A GROWER
Investigation Results: ??I?
Standardization Input Pattern: ++I+

I speculate that the reason for this is that for Standardization, ? represents one ore more consecutive words. This gives Standardization the ability to treat the words as a group, or as individual words. So, the pattern statement

Code: Select all

? | I | ?

would find both input patterns above.

ray.wurlod · Post by **ray.wurlod** » Fri Apr 06, 2012 3:50 pm

Pattern Action Language uses + for a single unknown word and ? for one or more unknown word (unknown = unclassified). In Investigation there's no value in differentiating between them.

Rahul Bharadwaj · Post by **Rahul Bharadwaj** » Mon Apr 09, 2012 12:31 am

Thanks rjdickson and Ray for the responces...

If my understanding is correct, the classification table/file defines to which class a token/word has to fall in.
if no match is found it should fall under either of the following classes (lets name them as 'unmatch-class-set')
^
?
>
<
@
~
0
-
/
&
#
(
)

so..
Rj and Ray
does that mean,
Investigate stage has different definations for the unmatch-class-set from that of Standardize Stage's unmatch-class-set....
To add more,
Investigate stage unmatch-class-set doesn't define '+' class and Standardize stage has more intelligent unmatch-class-set definations?
(though both use the same Rule set...and inturn the Pattern-Action file)

Please bare with me.......as I am quite new to 'Quality Stage'...

Thanks and Regards
Rahul

ray.wurlod · Post by **ray.wurlod** » Mon Apr 09, 2012 1:10 am

Only alphabetic class-designator characters appear in classification tables. The non-alphabetic class-designators are built-in and not specific to any particular rule set.