Newline character getting loaded into DB Column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Newline character getting loaded into DB Column

Post by VCInDSX »

Hi Group,
I have an RCP DB Loading job that reads from an input sequential file and parses it using a supplied schema file and then loads into a target database table (SQL Server 2005). The server is NLS-Enabled and the NLS map is set to UTF-8 at the job level.

This is working well in a Windows DS 8.X server. Recently this was migrated to a Linux 8.X server and I am facing a strange behavior where the last column in the input file is loaded into the database table with an extra NewLine character.

I tried switching the input Sequential file stage's format to use "DOS line terminator" format and then "UNIX line terminator", in vain.

All the columns in the schema file are VARCHAR(X) except one which is an NVARCHAR(X)

Is there any other setting that might be triggering the parser to grab the newline as a data item into the column?

Thanks in advance for your time,
-V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you dump your sequential file straight into another sequential file are the files identical?
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi ArndW,
Thanks for the followup.
My original design was
Seq. file ===> Transformer ==> ODBC Stage
||
||
||
\/
Rejects File


I replaced the final ODBC stage with a target Seq File stage and made sure it is using the correct NLS map (UTF-8). The output generated matches the input file's contents.

Thanks
-V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

And you are certain that this extra newline is not in the original file? If the file is a UNIX file it will only have 1 terminator (so don't use DOS-mode). Where in the last column is the new line - at the end of data (if so, what position and how long is the CHAR() definition) or is the last column empty except for the new line?
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

The source file is generated on a Windows box and when viewed in a binary editor contains CR+LF combo (0D 0A)

The file is residing on a network mount drive and is read by a Datastage job that is running in a LINUX grid server.

The last column is declared to be of VARCHAR(60) and the schema file has it defined as string[max=60]

Some records have additional empty spaces in the last column before the newline, but some don't. Irrespective of the presence or absence of the spaces, the newline is always loaded into the db column.

Maybe I am missing some other property where one could specify that the job should not treat the newline character combo (0D 0A) as a data item of the last column.


Thanks,
-V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In that case you should read with DOS format
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

You should change the input file format of the sequential file you are reading to read as the file as a DOS file.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Apologies for the delayed response.

Tried that Minhajuddin and no positive results. It still loads the newline character into the target DB column.
-V
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please confirm HOW you "tried that".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

You should be reading your source file as DOS formatted file.
If you are already doing that, you better check by dumping the data into another flat file (Replace the db by a sequential file, and set the file to be created in Unix format).
Now, check your file again in a Hex editor, Do you see the xOD xOA at the end of each line? If you do then there is something wrong with the way you are reading the file.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Ray & Minhajuddin,
Thanks for the followup. Here is what I did as the latest test - by replacing the target from DB to Sequential file.

In the Sequential File stage that is used to read the source file i have set the "Record delimiter string" = "DOS format"
Right-Click the "Record Level" item in the "Format Tab"
Navigate to the "Format As" pop-up menu
Select "DOS line terminator"

On the target sequential file stage, i set the "UNIX line terminator" for the format property.

After running the job, i see th "0D 0A" in the hex viewer in output file.

What else could be missing?

Thanks again for your time and input.
-V
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Ray & Minhajuddin,
Thanks for the followup. Here is what I did as the latest test - by replacing the target from DB to Sequential file.

In the Sequential File stage that is used to read the source file i have set the "Record delimiter string" = "DOS format"
Right-Click the "Record Level" item in the "Format Tab"
Navigate to the "Format As" pop-up menu
Select "DOS line terminator"

On the target sequential file stage, i set the "UNIX line terminator" for the format property.

After running the job, i see th "0D 0A" in the hex viewer in output file.

What else could be missing?

Thanks again for your time and input.
-V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In your test job, if you do a "diff filea fileb" are they identical?
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

The diff returns one line as a difference - for the header row. Strangely the header seems to have converted to a mixed case on the target and I have no idea how and why.
The header is completely lower case in the source, whereas the target is mixed.
E.g
Source is as follows
empid|~empname|~address1

Target has come out as
EmpID|~EmpName|~Address1

But the contents appear to have been copied verbatim as the diff did not find any difference.

Is there a simple hex editor that one could use in the Linux environemnt? I used textpad and see that they both have 0D 0A pairs.

Thanks again for your time
-V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I use "cat -v" on UNIX. I know it might sound repetitive - but please check the creation date/timestamp on the output file to ensure you are lokoing at the right one. Also, check the path in the source file in the job to make sure you are reading from the correct place.

You shouldn't be getting identical files if reading with DOS termination and writing with UNIX termination.
Post Reply