Newline character getting loaded into DB Column

VCInDSX · Post by **VCInDSX** » Tue May 06, 2008 11:53 pm

Hi Group,
I have an RCP DB Loading job that reads from an input sequential file and parses it using a supplied schema file and then loads into a target database table (SQL Server 2005). The server is NLS-Enabled and the NLS map is set to UTF-8 at the job level.

This is working well in a Windows DS 8.X server. Recently this was migrated to a Linux 8.X server and I am facing a strange behavior where the last column in the input file is loaded into the database table with an extra NewLine character.

I tried switching the input Sequential file stage's format to use "DOS line terminator" format and then "UNIX line terminator", in vain.

All the columns in the schema file are VARCHAR(X) except one which is an NVARCHAR(X)

Is there any other setting that might be triggering the parser to grab the newline as a data item into the column?

Thanks in advance for your time,

ArndW · Post by **ArndW** » Wed May 07, 2008 12:35 am

If you dump your sequential file straight into another sequential file are the files identical?

VCInDSX · Post by **VCInDSX** » Wed May 07, 2008 1:22 am

Hi ArndW,
Thanks for the followup.
My original design was
Seq. file ===> Transformer ==> ODBC Stage
||
||
||
\/
Rejects File

I replaced the final ODBC stage with a target Seq File stage and made sure it is using the correct NLS map (UTF-8). The output generated matches the input file's contents.

Thanks

ArndW · Post by **ArndW** » Wed May 07, 2008 2:09 am

And you are certain that this extra newline is not in the original file? If the file is a UNIX file it will only have 1 terminator (so don't use DOS-mode). Where in the last column is the new line - at the end of data (if so, what position and how long is the CHAR() definition) or is the last column empty except for the new line?

VCInDSX · Post by **VCInDSX** » Wed May 07, 2008 3:38 am

The source file is generated on a Windows box and when viewed in a binary editor contains CR+LF combo (0D 0A)

The file is residing on a network mount drive and is read by a Datastage job that is running in a LINUX grid server.

The last column is declared to be of VARCHAR(60) and the schema file has it defined as string[max=60]

Some records have additional empty spaces in the last column before the newline, but some don't. Irrespective of the presence or absence of the spaces, the newline is always loaded into the db column.

Maybe I am missing some other property where one could specify that the job should not treat the newline character combo (0D 0A) as a data item of the last column.

Thanks,

ArndW · Post by **ArndW** » Wed May 07, 2008 4:43 am

In that case you should read with DOS format

Minhajuddin · Post by **Minhajuddin** » Wed May 07, 2008 10:52 am

You should change the input file format of the sequential file you are reading to read as the file as a DOS file.

VCInDSX · Post by **VCInDSX** » Fri May 09, 2008 12:28 am

Apologies for the delayed response.

Tried that Minhajuddin and no positive results. It still loads the newline character into the target DB column.

ray.wurlod · Post by **ray.wurlod** » Fri May 09, 2008 4:28 pm

Please confirm HOW you "tried that".

Minhajuddin · Post by **Minhajuddin** » Sat May 10, 2008 10:30 am

You should be reading your source file as DOS formatted file.
If you are already doing that, you better check by dumping the data into another flat file (Replace the db by a sequential file, and set the file to be created in Unix format).
Now, check your file again in a Hex editor, Do you see the xOD xOA at the end of each line? If you do then there is something wrong with the way you are reading the file.

VCInDSX · Post by **VCInDSX** » Mon May 12, 2008 1:27 am

Ray & Minhajuddin,
Thanks for the followup. Here is what I did as the latest test - by replacing the target from DB to Sequential file.

In the Sequential File stage that is used to read the source file i have set the "Record delimiter string" = "DOS format"
Right-Click the "Record Level" item in the "Format Tab"
Navigate to the "Format As" pop-up menu
Select "DOS line terminator"

On the target sequential file stage, i set the "UNIX line terminator" for the format property.

After running the job, i see th "0D 0A" in the hex viewer in output file.

What else could be missing?

Thanks again for your time and input.

VCInDSX · Post by **VCInDSX** » Mon May 12, 2008 1:28 am

Ray & Minhajuddin,
Thanks for the followup. Here is what I did as the latest test - by replacing the target from DB to Sequential file.

In the Sequential File stage that is used to read the source file i have set the "Record delimiter string" = "DOS format"
Right-Click the "Record Level" item in the "Format Tab"
Navigate to the "Format As" pop-up menu
Select "DOS line terminator"

On the target sequential file stage, i set the "UNIX line terminator" for the format property.

After running the job, i see th "0D 0A" in the hex viewer in output file.

What else could be missing?

Thanks again for your time and input.

ArndW · Post by **ArndW** » Mon May 12, 2008 3:03 am

In your test job, if you do a "diff filea fileb" are they identical?

VCInDSX · Post by **VCInDSX** » Mon May 12, 2008 9:40 am

The diff returns one line as a difference - for the header row. Strangely the header seems to have converted to a mixed case on the target and I have no idea how and why.
The header is completely lower case in the source, whereas the target is mixed.
E.g
Source is as follows
empid|~empname|~address1

Target has come out as
EmpID|~EmpName|~Address1

But the contents appear to have been copied verbatim as the diff did not find any difference.

Is there a simple hex editor that one could use in the Linux environemnt? I used textpad and see that they both have 0D 0A pairs.

Thanks again for your time

ArndW · Post by **ArndW** » Mon May 12, 2008 9:50 am

I use "cat -v" on UNIX. I know it might sound repetitive - but please check the creation date/timestamp on the output file to ensure you are lokoing at the right one. Also, check the path in the source file in the job to make sure you are reading from the correct place.

You shouldn't be getting identical files if reading with DOS termination and writing with UNIX termination.