Teradata Multi-load fails with character error
Moderators: chulett, rschirm, roy
Teradata Multi-load fails with character error
Hi everybody,
I have multiload error.
I have UTF8 input -> Teradata multiload UTF8 default charset is ok BUT it fails with 6706 or 6705 error which says character error . As I know it is unicode error character such as U-FFFD or in ASCII 0x1a. How can I clear them from the txt?
I haven't get used to datastage well. Unfortunately I can not investigate from Teradata side with the translate_chk function. I don't have rights on the client side.
In this case I need Basic language solution.
Can somebody give me a good idea?
\
*Note - Subect Title Changed to be more descriptive - Andy*
I have multiload error.
I have UTF8 input -> Teradata multiload UTF8 default charset is ok BUT it fails with 6706 or 6705 error which says character error . As I know it is unicode error character such as U-FFFD or in ASCII 0x1a. How can I clear them from the txt?
I haven't get used to datastage well. Unfortunately I can not investigate from Teradata side with the translate_chk function. I don't have rights on the client side.
In this case I need Basic language solution.
Can somebody give me a good idea?
\
*Note - Subect Title Changed to be more descriptive - Andy*
Last edited by kurics40 on Thu Jan 21, 2010 6:40 pm, edited 1 time in total.
If you really think that's your solution, use Convert() along with the decimal value of the character to strip it from the string:
Next time please do a better job of choosing a thread subject. We know your question would be about "DataStage" so make it pertinent to your actual issue. This one could have been "Multiload character error" or something along those lines.
Code: Select all
Convert(CHAR(26),"",YourField)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You don't eliminate them - they are your customer's data!
You figure out how to handle them, typically by specifying the correct character mappings in both DataStage and Teradata.
You figure out how to handle them, typically by specifying the correct character mappings in both DataStage and Teradata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can only "mend" what you call "bad" characters if you know what they are and what they should be. You can, of course, eliminate them (with permission).
Oconv() supports NLSmapname as its second argument - perhaps you could investiate this?
Oconv() supports NLSmapname as its second argument - perhaps you could investiate this?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Allright. Sorry for the comment. I see I did a mistake.
Okay lets investigate the issue:
There is one typical problematic row.
"357 243 267" =  makes the 99% the problem
cat 203row
|Suspect_GSFAISLPLFI2|FIC1303835|FIA1176256|SYTO INTERNATIONAL BV|SYTO INTERNATIONAL BV
od -c 203row - ASCII way
0000000 357 243 267 | S u s p e c t _ G S F A
0000020 I S L P L F I 2 | F I C 1 3 0 3
0000040 8 3 5 | F I A 1 1 7 6 2 5 6 | S
0000060 Y T O I N T E R N A T I O N A
0000100 L B V | S Y T O I N T E R N
0000120 A T I O N A L B V \n
0000133
od -x 203row
0000000 efa3 b77c 5375 7370 6563 745f 4753 4641
0000020 4953 4c50 4c46 4932 7c46 4943 3133 3033
0000040 3833 357c 4649 4131 3137 3632 3536 7c53
0000060 5954 4f20 494e 5445 524e 4154 494f 4e41
0000100 4c20 4256 7c53 5954 4f20 494e 5445 524e
0000120 4154 494f 4e41 4c20 4256 0a00
0000133
Teradata says:
6705 An illegally formed character string was
encountered during translation.
..
Unicode Any single-byte character (SBC).
It would be good if I can at least transform the unicode character till the | sign ASCII or LATIN if it is single-byte (damaged) Unicode.
Okay lets investigate the issue:
There is one typical problematic row.
"357 243 267" =  makes the 99% the problem
cat 203row
|Suspect_GSFAISLPLFI2|FIC1303835|FIA1176256|SYTO INTERNATIONAL BV|SYTO INTERNATIONAL BV
od -c 203row - ASCII way
0000000 357 243 267 | S u s p e c t _ G S F A
0000020 I S L P L F I 2 | F I C 1 3 0 3
0000040 8 3 5 | F I A 1 1 7 6 2 5 6 | S
0000060 Y T O I N T E R N A T I O N A
0000100 L B V | S Y T O I N T E R N
0000120 A T I O N A L B V \n
0000133
od -x 203row
0000000 efa3 b77c 5375 7370 6563 745f 4753 4641
0000020 4953 4c50 4c46 4932 7c46 4943 3133 3033
0000040 3833 357c 4649 4131 3137 3632 3536 7c53
0000060 5954 4f20 494e 5445 524e 4154 494f 4e41
0000100 4c20 4256 7c53 5954 4f20 494e 5445 524e
0000120 4154 494f 4e41 4c20 4256 0a00
0000133
Teradata says:
6705 An illegally formed character string was
encountered during translation.
..
Unicode Any single-byte character (SBC).
It would be good if I can at least transform the unicode character till the | sign ASCII or LATIN if it is single-byte (damaged) Unicode.
Last edited by kurics40 on Thu Jan 21, 2010 5:07 pm, edited 1 time in total.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I don't know the answer, because it will depend on how your data are actually encoded. And there's no way I can know this, so I suggest a more general approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
possible solution
Hi all,
Might be I found a solution:
http://www.teradata.com/DownloadCenter/ ... 123-1.aspx
The other can be to write Inmod Multiload.
If somebody knows a DataStage solution for this it would be great.
Might be I found a solution:
http://www.teradata.com/DownloadCenter/ ... 123-1.aspx
The other can be to write Inmod Multiload.
If somebody knows a DataStage solution for this it would be great.
Workaround
As some of my colleague asked more workarounds:
You can create error table in teradata of any table:
Syntax
CREATE ERROR TABLE FOR tbl_name;
What I recognized that many quick and dirty developers try to load the data in ansi mode because it gives no error. If the data is in UTF8 it is highly not recommended to load in ANSI. UTF8 can represent a character in 1 or 2 or 3 or 4 bytes as a multibyte character. If ANSI latin mode mapps each byte as a valid character than it screws up the data in irreversible way in the target, unless the UTF8 data content matches in a special scenario with ANSI latin characterset.
You can create error table in teradata of any table:
Syntax
CREATE ERROR TABLE FOR tbl_name;
What I recognized that many quick and dirty developers try to load the data in ansi mode because it gives no error. If the data is in UTF8 it is highly not recommended to load in ANSI. UTF8 can represent a character in 1 or 2 or 3 or 4 bytes as a multibyte character. If ANSI latin mode mapps each byte as a valid character than it screws up the data in irreversible way in the target, unless the UTF8 data content matches in a special scenario with ANSI latin characterset.