Standardization process failed
Standardization process failed
I am trying to standardize the international addresses using standardize stage. Below is the error I am getting when I am using JPAREA rule set, can somebody please suggest me
Error;
Standardization process failed, The classificationt table has duplicate entry.
where can I see the duplicate entries, Is it possible to remove duplicate entries if any. ?
thanks
Arun
Error;
Standardization process failed, The classificationt table has duplicate entry.
where can I see the duplicate entries, Is it possible to remove duplicate entries if any. ?
thanks
Arun
arun
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Go to Standardization Rules\Japan\JPAREA in the Repository Explorer.
Double click the .SET file and then in the rule dialog, click the Test button.
If there is a duplicate term in the classification file, it will tell you which line it is on (and I think also the token itself).
Then close the dialog and open JPAREA.CLS and go to the line in question. The token at the start of the line is the issue: it can't be in the file more than once.
Look for the other occurrences of that token and you'll have 2 options:
1. decide which one[s] you think it is safe to remove (your call as to which, but they were put there for a reason, so expect some effect of doing this).
2. you'll have to give it a new token type that is used by a new proc you'll have to write. The proc will use some context to work out which type it should be in that specific situation, and re-classify the token. Look at Multiple_Semantics in AUADDR for an example of this.
Double click the .SET file and then in the rule dialog, click the Test button.
If there is a duplicate term in the classification file, it will tell you which line it is on (and I think also the token itself).
Then close the dialog and open JPAREA.CLS and go to the line in question. The token at the start of the line is the issue: it can't be in the file more than once.
Look for the other occurrences of that token and you'll have 2 options:
1. decide which one[s] you think it is safe to remove (your call as to which, but they were put there for a reason, so expect some effect of doing this).
2. you'll have to give it a new token type that is used by a new proc you'll have to write. The proc will use some context to work out which type it should be in that specific situation, and re-classify the token. Look at Multiple_Semantics in AUADDR for an example of this.
I went to the repository in Datastage designer and tested the .SET file.
Then the same error occuring. "Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS
" But not telling any specificline. also says,
I am not sure where I've to do the token intialization.
Then the same error occuring. "Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS
" But not telling any specificline. also says,
I am not sure where I've to do the token intialization.
arun
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When I clicked on the test button in "Rule Management - JPAREA" window, it showing below error.
"Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS
let me know if this is not the correct way of testing ??
"Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS
let me know if this is not the correct way of testing ??
arun
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Hmm, when I just tried it out using an AU ruleset, it didn't show the line number (my bad).
What it did show was the actual offending token on the very next line of the error message.
Surely that gives you enough to find the other occurrences of that token and work from there.
As for how to test, the test dialog lets you pick up almost everything without having to create and run a job.
What it did show was the actual offending token on the very next line of the error message.
Surely that gives you enough to find the other occurrences of that token and work from there.
As for how to test, the test dialog lets you pick up almost everything without having to create and run a job.
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
When it says it can't initialise the tokenisation environment, it means that it can't load the ruleset. As it states, it can't do that because you have a duplicate token in the classification file.
If for some reason it doesn't tell you the offending token, you'll just have to open up the CLS file and manually look for the duplicate.
If for some reason it doesn't tell you the offending token, you'll just have to open up the CLS file and manually look for the duplicate.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The CLS file should be understandable (if you understand Japanese). It is a list with four columns:
- a token (word) that might appear in your data
the standard form of that token
a letter designating the class of that token within the rule set
(optional) an uncertainty threshold number, such as 850
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.