Sequential File Stage in Parallel and Server

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Sequential File Stage in Parallel and Server

Post by ddevdutt »

I have a peculiar problem while reading a comma separated text file. The records in my flat file are as follows:

Abe,"123,Main. St.",,12345
Willy,"""Crazy Guy""",,56745
Betty,987 Silver Ave,,98765

The above example has 4 columns. I am able to read the file correctly when I use the sequential file stage in Server.
But the data is skewed when I try reading it in Parallel.

Server Seq File Stage Format setting:
Delimiter: ,
Quote Character: "
Missing Column Action: Pad with SQL Null
Default Padding: #

Parallel Seq File Stage Format setting:
Final Delimiter: end
Delimiter: comma
Quote: Double

Not sure if I need to add anything else to get this data read using the parallel seq file stage.

Anyone come across this before?
Any comments or suggestions welcome.
DD

Success is right around the corner
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Data is skewed? Example, please. Is your problem just with the "123 Main St" field with the extra comma?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Post by ddevdutt »

Well with the Format Setting specified above, I am able to read the record with "123 Main St"
but unable to read this: Willy,"""Crazy Guy""",,56745

When I view data , this is what I get (I'm using pipe just for representation)

C1 | C2 | C3 | C4 |
--------------------------------------------------
Abe |123,Main. St. | |12345|
--------------------------------------------------
Willy | |CrazyGuy""" |,56745|
--------------------------------------------------
Betty|987 Silver Ave| |98765|
--------------------------------------------------


chulett wrote:Data is skewed? Example, please. Is your problem just with the "123 Main St" field with the extra comma? ...
DD

Success is right around the corner
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, that I've seen before and I don't recall if there is a true solution for this or even a good workaround. For some reason PX really doesn't handle those embedded quotes well in a quote delimited string, don't know why.

Hopefully somebody else will chime in here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Post by ddevdutt »

Thanks Craig.

Anyone else seen or faced this type of a problem earlier? Again any suggestions or comments are welcome.
DD

Success is right around the corner
jcthornton
Premium Member
Premium Member
Posts: 79
Joined: Thu Mar 22, 2007 4:58 pm
Location: USA

Post by jcthornton »

DD,

I'm not aware of a way to have consecutive quotes treated as a single quote character automatically.

Have thought about importing the record as a single field, running it through a transformer to convert consecutive quote characters to a single quote character?

Then you could use a Column Import stage to split it into your desired structure.
relati
Participant
Posts: 17
Joined: Mon Oct 01, 2007 12:44 pm

Post by relati »

DD,

Here's a work around if you can manipulate the source file outside of DataStage. Replace """ with " or nothing in the source file, save it, and try importing the data again.
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Post by ddevdutt »

Thanks! Everyone

But the problem is that I get those files on a daily basis and once those files are pushed to our servers, our process is executed. I was actually looking for something that can be automated.

I'm currently using a Server job to read that file and everything is running great. I just wanted to know if anyone was able to read such a file using a Sequential file stage in a parallel job :)
relati wrote:DD,
Here's a work around if you can manipulate the source file outside of DataStage. Replace """ with " or nothing in the source file, save it, and try importing the data again.
DD

Success is right around the corner
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

You could use the filter option of the Sequential file read and tr -d "\"" those bad boys outta there. ALL of 'em.
You'll have to change the format definition to quotes=none
This is not optimal but kinda works. A ringing endorsement of the technique! Caveat Emptor baby.
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Post by ddevdutt »

I'll give that a try and let you know if that works :)

If I set quotes=none then another problem arises. I have certain fields like "123,Main. St." and this data is going to get skewed since the field delimiter is comma.
DD

Success is right around the corner
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

We have this fun coming up very soon, and maybe a

Code: Select all

   tr -d "\"\"" 
Would do the trick? That way you only ever get rid of two double quotes together...

Getting ,"", would lead to ,, which should be OK, biggest problem would be ""xyz,abc""... but may be less common at least?
Post Reply