Tokenization of more than one word

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Tokenization of more than one word

Post by BuddingDev »

I like to know how to tokenize something which has more than one word in classification file.

As following example illustrates that the THIS COMAPNY AND THIS ARTICLE needs to standardize to INVALID word. However I am not able to achieve this due duplication of THIS word

THIS COMPANY INVALID C
THIS ARTICLE INVALID C

The rule set is throwing an error saying duplicate entry THIS is found during testing.
Can somebody suggest me how to use two words in classification table ?
Thanking in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Surround the string with double quotes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply