Standardization process failed

akonda · Post by **akonda** » Mon Oct 24, 2011 12:50 pm

I am trying to standardize the international addresses using standardize stage. Below is the error I am getting when I am using JPAREA rule set, can somebody please suggest me

Error;

Standardization process failed, The classificationt table has duplicate entry.

where can I see the duplicate entries, Is it possible to remove duplicate entries if any. ?

thanks
Arun

ray.wurlod · Post by **ray.wurlod** » Mon Oct 24, 2011 4:48 pm

Use the Rule Set management tool and look at the CLS file.

stuartjvnorton · Post by **stuartjvnorton** » Mon Oct 24, 2011 5:37 pm

Go to Standardization Rules\Japan\JPAREA in the Repository Explorer.
Double click the .SET file and then in the rule dialog, click the Test button.
If there is a duplicate term in the classification file, it will tell you which line it is on (and I think also the token itself).
Then close the dialog and open JPAREA.CLS and go to the line in question. The token at the start of the line is the issue: it can't be in the file more than once.

Look for the other occurrences of that token and you'll have 2 options:
1. decide which one[s] you think it is safe to remove (your call as to which, but they were put there for a reason, so expect some effect of doing this).
2. you'll have to give it a new token type that is used by a new proc you'll have to write. The proc will use some context to work out which type it should be in that specific situation, and re-classify the token. Look at Multiple_Semantics in AUADDR for an example of this.

akonda · Post by **akonda** » Tue Oct 25, 2011 8:14 am

I went to the repository in Datastage designer and tested the .SET file.

Then the same error occuring. "Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

" But not telling any specificline. also says,
I am not sure where I've to do the token intialization.

ray.wurlod · Post by **ray.wurlod** » Tue Oct 25, 2011 10:05 am

How did you "test the SET file" and what happened?

akonda · Post by **akonda** » Tue Oct 25, 2011 10:56 am

When I clicked on the test button in "Rule Management - JPAREA" window, it showing below error.

"Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

let me know if this is not the correct way of testing ??

stuartjvnorton · Post by **stuartjvnorton** » Tue Oct 25, 2011 5:19 pm

Hmm, when I just tried it out using an AU ruleset, it didn't show the line number (my bad).
What it did show was the actual offending token on the very next line of the error message.

Surely that gives you enough to find the other occurrences of that token and work from there.

As for how to test, the test dialog lets you pick up almost everything without having to create and run a job.

akonda · Post by **akonda** » Wed Oct 26, 2011 9:10 am

For me it is not showing me any token informaiton, what I am not udnerstanding is "where can I initialize the token", because it says

"Initialization of tokenization environment failed."

stuartjvnorton · Post by **stuartjvnorton** » Wed Oct 26, 2011 4:32 pm

When it says it can't initialise the tokenisation environment, it means that it can't load the ruleset. As it states, it can't do that because you have a duplicate token in the classification file.

If for some reason it doesn't tell you the offending token, you'll just have to open up the CLS file and manually look for the duplicate.

akonda · Post by **akonda** » Fri Oct 28, 2011 7:15 am

CLS file is not in understandable language. we raised a PMR to IBM to override the JAPAN ruleset.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 28, 2011 9:39 am

The CLS file should be understandable (if you understand Japanese). It is a list with four columns:

a token (word) that might appear in your data

the standard form of that token

a letter designating the class of that token within the rule set

(optional) an uncertainty threshold number, such as 850

The JPAREA.CLS file in my installation is perfectly understandable. It may be that yours is corrupted.