I like to know how to tokenize something which has more than one word in classification file.
As following example illustrates that the THIS COMAPNY AND THIS ARTICLE needs to standardize to INVALID word. However I am not able to achieve this due duplication of THIS word
THIS COMPANY INVALID C
THIS ARTICLE INVALID C
The rule set is throwing an error saying duplicate entry THIS is found during testing.
Can somebody suggest me how to use two words in classification table ?
Thanking in advance.
Tokenization of more than one word
-
- Premium Member
- Posts: 43
- Joined: Wed Feb 08, 2012 8:12 pm
- Location: United States
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: