It uses semicolon as a delimiter for the columns and appears to have a semicolon after the last column as well as a LF character.
The file appears to have the following escaped (using a backslash): null/empty columns, semicolons & line-feeds that appear in the data.
My parallel job is structured as:
[sequential file stage] -> [transformer stage] -> [netezza connector stage]
My import process spits out warnings (resulting in skipped rows) whenever it sees either a semicolon or a LF within a string value. Could be that one or the other is failing it, not really sure.
I have attempted various iterations of convert and ereplace within the transformer stage and each time I see the following warnings and the problem records are skipped.
- Delimiter for field "field name" not found: input: {some data from the row}, at offset: x
- Field "field name" has import error and no default value: data: {some data from the row}, at offset x
The rest of the data will load- just not rows that are broken due to embedded delimiter or LF. (all broken rows appear to have both occurrences as represented below)
I have tried using a server job instead of a parallel job along with the "Contains terminators" set to 'Yes' but this fails (no rows get loaded).
It seems as if I should be able to tell DataStage to ignore characters immediately following the backslash, but I cannot figure out where/how to do this.
Code: Select all
1 78000274787.0;0;1;1;2;2;\ ;1;89334;\ ;3;-1;+12025550101;2019-08-21 19:40:41.541;...
2 78000274769.0;1;1;1;1;4;er failures\; nested exception is: \$
3 5589 ^Icom.cisco.wfapi.WFExecutionException: Too many transfer failures;3;-1;2025550101;
4 78000274788.0;0;1;1;1;1;\ ;3;-1;2025550101;2;-1;29748;2019-08-21 19:40:45.863;
5 78000274790.0;0;1;1;3;1;\ ;1;87574;\ ;2;-1;29830;2019-08-21 19:40:57.714;2019-08-21 19:46:12.265;
Line 1: Good data, imports fine
Line 2: there is an escaped semicolon (after the word 'failures') and an escaped LF at the end
Line3: Is the continuation of the data that was pushed to the next line due to the LF in the row above.
Line 4: Good data, imports fine
Line 5: Good data, imports fine
This represents what the line-endings look like, in case that's important. Note the '$' which is the LF representation when viewing with Vim.
Code: Select all
1 ... ;\ ;N;f;f;f;f;f;-1;\ ;000000000068790706EE000A00000000;0;-1;$
2 78000274769.0;1;1;1;1;4;er failures\; nested exception is: \$
3 ... ;\ ;N;f;f;f;f;f;-1;0000000000060FBC0214926200000000;\ ;0;-1;$
4 ... ;\ ;N;f;f;f;f;f;-1;0000000000060F960214927200000000;\ ;0;-1;$
5 ... ;\ ;N;f;f;f;f;f;-1;000000000068790D06EE004100000000;000000000068790D06EE003F00000000;0;-1;$
As a last resort I am thinking of running the file through a Sed pre-process, but I keep thinking that DS can handle this simple task.
Thank you in advance.