Standardization process failed

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Standardization process failed

Post by akonda »

I am trying to standardize the international addresses using standardize stage. Below is the error I am getting when I am using JPAREA rule set, can somebody please suggest me

Error;

Standardization process failed, The classificationt table has duplicate entry.

where can I see the duplicate entries, Is it possible to remove duplicate entries if any. ?

thanks
Arun
arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the Rule Set management tool and look at the CLS file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Go to Standardization Rules\Japan\JPAREA in the Repository Explorer.
Double click the .SET file and then in the rule dialog, click the Test button.
If there is a duplicate term in the classification file, it will tell you which line it is on (and I think also the token itself).
Then close the dialog and open JPAREA.CLS and go to the line in question. The token at the start of the line is the issue: it can't be in the file more than once.

Look for the other occurrences of that token and you'll have 2 options:
1. decide which one[s] you think it is safe to remove (your call as to which, but they were put there for a reason, so expect some effect of doing this).
2. you'll have to give it a new token type that is used by a new proc you'll have to write. The proc will use some context to work out which type it should be in that specific situation, and re-classify the token. Look at Multiple_Semantics in AUADDR for an example of this.
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Post by akonda »

I went to the repository in Datastage designer and tested the .SET file.

Then the same error occuring. "Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

" But not telling any specificline. also says,
I am not sure where I've to do the token intialization.
arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How did you "test the SET file" and what happened?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Post by akonda »

When I clicked on the test button in "Rule Management - JPAREA" window, it showing below error.

"Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

let me know if this is not the correct way of testing ??
arun
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Hmm, when I just tried it out using an AU ruleset, it didn't show the line number (my bad).
What it did show was the actual offending token on the very next line of the error message.

Surely that gives you enough to find the other occurrences of that token and work from there.


As for how to test, the test dialog lets you pick up almost everything without having to create and run a job.
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Post by akonda »

For me it is not showing me any token informaiton, what I am not udnerstanding is "where can I initialize the token", because it says

"Initialization of tokenization environment failed."
arun
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

When it says it can't initialise the tokenisation environment, it means that it can't load the ruleset. As it states, it can't do that because you have a duplicate token in the classification file.

If for some reason it doesn't tell you the offending token, you'll just have to open up the CLS file and manually look for the duplicate.
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Post by akonda »

CLS file is not in understandable language. we raised a PMR to IBM to override the JAPAN ruleset.
arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The CLS file should be understandable (if you understand Japanese). It is a list with four columns:
  • a token (word) that might appear in your data

    the standard form of that token

    a letter designating the class of that token within the rule set

    (optional) an uncertainty threshold number, such as 850
The JPAREA.CLS file in my installation is perfectly understandable. It may be that yours is corrupted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply