Sequential File Stage in Parallel and Server

ddevdutt · Post by **ddevdutt** » Thu Jul 02, 2009 9:38 am

I have a peculiar problem while reading a comma separated text file. The records in my flat file are as follows:

Abe,"123,Main. St.",,12345
Willy,"""Crazy Guy""",,56745
Betty,987 Silver Ave,,98765

The above example has 4 columns. I am able to read the file correctly when I use the sequential file stage in Server.
But the data is skewed when I try reading it in Parallel.

Server Seq File Stage Format setting:
Delimiter: ,
Quote Character: "
Missing Column Action: Pad with SQL Null
Default Padding: #

Parallel Seq File Stage Format setting:
Final Delimiter: end
Delimiter: comma
Quote: Double

Not sure if I need to add anything else to get this data read using the parallel seq file stage.

Anyone come across this before?
Any comments or suggestions welcome.

chulett · Post by **chulett** » Thu Jul 02, 2009 9:40 am

Data is skewed? Example, please. Is your problem just with the "123 Main St" field with the extra comma?

ddevdutt · Post by **ddevdutt** » Thu Jul 02, 2009 10:06 am

Well with the Format Setting specified above, I am able to read the record with "123 Main St"
but unable to read this: Willy,"""Crazy Guy""",,56745

When I view data , this is what I get (I'm using pipe just for representation)

C1 | C2 | C3 | C4 |
--------------------------------------------------
Abe |123,Main. St. | |12345|
--------------------------------------------------
Willy | |CrazyGuy""" |,56745|
--------------------------------------------------
Betty|987 Silver Ave| |98765|
--------------------------------------------------

chulett wrote:Data is skewed? Example, please. Is your problem just with the "123 Main St" field with the extra comma? ...

chulett · Post by **chulett** » Thu Jul 02, 2009 10:35 am

OK, that I've seen before and I don't recall if there is a true solution for this or even a good workaround. For some reason PX really doesn't handle those embedded quotes well in a quote delimited string, don't know why.

Hopefully somebody else will chime in here.

ddevdutt · Post by **ddevdutt** » Thu Jul 02, 2009 10:49 am

Thanks Craig.

Anyone else seen or faced this type of a problem earlier? Again any suggestions or comments are welcome.

jcthornton · Post by **jcthornton** » Thu Jul 02, 2009 12:37 pm

DD,

I'm not aware of a way to have consecutive quotes treated as a single quote character automatically.

Have thought about importing the record as a single field, running it through a transformer to convert consecutive quote characters to a single quote character?

Then you could use a Column Import stage to split it into your desired structure.

relati · Post by **relati** » Mon Jul 06, 2009 3:55 pm

DD,

Here's a work around if you can manipulate the source file outside of DataStage. Replace """ with " or nothing in the source file, save it, and try importing the data again.

ddevdutt · Post by **ddevdutt** » Wed Jul 15, 2009 9:58 am

Thanks! Everyone

But the problem is that I get those files on a daily basis and once those files are pushed to our servers, our process is executed. I was actually looking for something that can be automated.

I'm currently using a Server job to read that file and everything is running great. I just wanted to know if anyone was able to read such a file using a Sequential file stage in a parallel job

relati wrote:DD,
Here's a work around if you can manipulate the source file outside of DataStage. Replace """ with " or nothing in the source file, save it, and try importing the data again.

throbinson · Post by **throbinson** » Wed Jul 15, 2009 10:08 am

You could use the filter option of the Sequential file read and tr -d "\"" those bad boys outta there. ALL of 'em.
You'll have to change the format definition to quotes=none
This is not optimal but kinda works. A ringing endorsement of the technique! Caveat Emptor baby.

ddevdutt · Post by **ddevdutt** » Wed Jul 15, 2009 10:29 am

I'll give that a try and let you know if that works

If I set quotes=none then another problem arises. I have certain fields like "123,Main. St." and this data is going to get skewed since the field delimiter is comma.

Kryt0n · Post by **Kryt0n** » Sun Jul 19, 2009 10:52 pm

We have this fun coming up very soon, and maybe a

Code: Select all

   tr -d "\"\""

Would do the trick? That way you only ever get rid of two double quotes together...

Getting ,"", would lead to ,, which should be OK, biggest problem would be ""xyz,abc""... but may be less common at least?