'?' Vs '+' in 'Investigate Stage' Vs 'Standardize Stage'

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
Rahul Bharadwaj
Premium Member
Premium Member
Posts: 24
Joined: Mon Jul 14, 2008 12:03 am
Location: Bangalore

'?' Vs '+' in 'Investigate Stage' Vs 'Standardize Stage'

Post by Rahul Bharadwaj »

Hi Every one

We are trying to investigate and standardize names.

In the 'Investigate Stage' (word investigate using rule-set ABC.set) for the name 'GULSAN A GROWER' the pattern genarated is '?I?'

In the 'Standardize Stage' (using the same rule set ABC.set) for the same name , the pattern generated is '+I+'

Though logically both patterns are same, but why is this difference in patterns ('?I?' Vs '+I+')

Thanks and Regards
Rahul
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Yes, this is an expected result. ? in investigation is the same as a + in Standardization.

Example 1:
Input: GULSAN A GROWER
Investigation Results: ?I?
Standardization Input Pattern: +I+

Example 2: (the first word is repeated twice)
Input: GULSAN GUSLAN A GROWER
Investigation Results: ??I?
Standardization Input Pattern: ++I+

I speculate that the reason for this is that for Standardization, ? represents one ore more consecutive words. This gives Standardization the ability to treat the words as a group, or as individual words. So, the pattern statement

Code: Select all

? | I | ?
would find both input patterns above.
Regards,
Robert
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pattern Action Language uses + for a single unknown word and ? for one or more unknown word (unknown = unclassified). In Investigation there's no value in differentiating between them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rahul Bharadwaj
Premium Member
Premium Member
Posts: 24
Joined: Mon Jul 14, 2008 12:03 am
Location: Bangalore

Post by Rahul Bharadwaj »

Thanks rjdickson and Ray for the responces...

If my understanding is correct, the classification table/file defines to which class a token/word has to fall in.
if no match is found it should fall under either of the following classes (lets name them as 'unmatch-class-set')
^
?
>
<
@
~
0
-
/
&
#
(
)


so..
Rj and Ray
does that mean,
Investigate stage has different definations for the unmatch-class-set from that of Standardize Stage's unmatch-class-set....
To add more,
Investigate stage unmatch-class-set doesn't define '+' class and Standardize stage has more intelligent unmatch-class-set definations?
(though both use the same Rule set...and inturn the Pattern-Action file)

Please bare with me.......as I am quite new to 'Quality Stage'... :)

Thanks and Regards
Rahul
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only alphabetic class-designator characters appear in classification tables. The non-alphabetic class-designators are built-in and not specific to any particular rule set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply