Sequential File Stage in Parallel and Server
Moderators: chulett, rschirm, roy
Sequential File Stage in Parallel and Server
I have a peculiar problem while reading a comma separated text file. The records in my flat file are as follows:
Abe,"123,Main. St.",,12345
Willy,"""Crazy Guy""",,56745
Betty,987 Silver Ave,,98765
The above example has 4 columns. I am able to read the file correctly when I use the sequential file stage in Server.
But the data is skewed when I try reading it in Parallel.
Server Seq File Stage Format setting:
Delimiter: ,
Quote Character: "
Missing Column Action: Pad with SQL Null
Default Padding: #
Parallel Seq File Stage Format setting:
Final Delimiter: end
Delimiter: comma
Quote: Double
Not sure if I need to add anything else to get this data read using the parallel seq file stage.
Anyone come across this before?
Any comments or suggestions welcome.
Abe,"123,Main. St.",,12345
Willy,"""Crazy Guy""",,56745
Betty,987 Silver Ave,,98765
The above example has 4 columns. I am able to read the file correctly when I use the sequential file stage in Server.
But the data is skewed when I try reading it in Parallel.
Server Seq File Stage Format setting:
Delimiter: ,
Quote Character: "
Missing Column Action: Pad with SQL Null
Default Padding: #
Parallel Seq File Stage Format setting:
Final Delimiter: end
Delimiter: comma
Quote: Double
Not sure if I need to add anything else to get this data read using the parallel seq file stage.
Anyone come across this before?
Any comments or suggestions welcome.
DD
Success is right around the corner
Success is right around the corner
Well with the Format Setting specified above, I am able to read the record with "123 Main St"
but unable to read this: Willy,"""Crazy Guy""",,56745
When I view data , this is what I get (I'm using pipe just for representation)
C1 | C2 | C3 | C4 |
--------------------------------------------------
Abe |123,Main. St. | |12345|
--------------------------------------------------
Willy | |CrazyGuy""" |,56745|
--------------------------------------------------
Betty|987 Silver Ave| |98765|
--------------------------------------------------
but unable to read this: Willy,"""Crazy Guy""",,56745
When I view data , this is what I get (I'm using pipe just for representation)
C1 | C2 | C3 | C4 |
--------------------------------------------------
Abe |123,Main. St. | |12345|
--------------------------------------------------
Willy | |CrazyGuy""" |,56745|
--------------------------------------------------
Betty|987 Silver Ave| |98765|
--------------------------------------------------
chulett wrote:Data is skewed? Example, please. Is your problem just with the "123 Main St" field with the extra comma? ...
DD
Success is right around the corner
Success is right around the corner
OK, that I've seen before and I don't recall if there is a true solution for this or even a good workaround. For some reason PX really doesn't handle those embedded quotes well in a quote delimited string, don't know why.
Hopefully somebody else will chime in here.
Hopefully somebody else will chime in here.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 79
- Joined: Thu Mar 22, 2007 4:58 pm
- Location: USA
DD,
I'm not aware of a way to have consecutive quotes treated as a single quote character automatically.
Have thought about importing the record as a single field, running it through a transformer to convert consecutive quote characters to a single quote character?
Then you could use a Column Import stage to split it into your desired structure.
I'm not aware of a way to have consecutive quotes treated as a single quote character automatically.
Have thought about importing the record as a single field, running it through a transformer to convert consecutive quote characters to a single quote character?
Then you could use a Column Import stage to split it into your desired structure.
Thanks! Everyone
But the problem is that I get those files on a daily basis and once those files are pushed to our servers, our process is executed. I was actually looking for something that can be automated.
I'm currently using a Server job to read that file and everything is running great. I just wanted to know if anyone was able to read such a file using a Sequential file stage in a parallel job
But the problem is that I get those files on a daily basis and once those files are pushed to our servers, our process is executed. I was actually looking for something that can be automated.
I'm currently using a Server job to read that file and everything is running great. I just wanted to know if anyone was able to read such a file using a Sequential file stage in a parallel job
relati wrote:DD,
Here's a work around if you can manipulate the source file outside of DataStage. Replace """ with " or nothing in the source file, save it, and try importing the data again.
DD
Success is right around the corner
Success is right around the corner
-
- Charter Member
- Posts: 299
- Joined: Wed Nov 13, 2002 5:38 pm
- Location: USA
We have this fun coming up very soon, and maybe a
Would do the trick? That way you only ever get rid of two double quotes together...
Getting ,"", would lead to ,, which should be OK, biggest problem would be ""xyz,abc""... but may be less common at least?
Code: Select all
tr -d "\"\""
Getting ,"", would lead to ,, which should be OK, biggest problem would be ""xyz,abc""... but may be less common at least?