How to process a sequential file containing quotes.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

How to process a sequential file containing quotes.

Post by hiteshm »

I need to load a comma separated sequential file where each column is wrapped in a pair of quotes e.g.

"Field 1", "Field 2", "Field 3"

But in some columns there are quotes within the data, e.g.

"Field 1", "New ""Field" 2"", "Field 3"

When the above csv example is loaded into Excel, it correctly parses the data as below:-

Field 1 New "Field 2" Field 3

Within the DataStage sequential stage I have specified the delimiter as commas and quote = double.

But DataStage does not seem to be able to handle to above scenario and rejects the row.

I've thought of filtering the file using an awk command for example, but shouldn't DataStage be able to handle this?
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

I have had this happen and in order to get it working, I substituted CSV for TXT and tab delimited. I found that the comma delimited sometime give me a headache.
Jim Stewart
csri
Participant
Posts: 99
Joined: Wed Jul 11, 2007 8:02 am

Re: How to process a sequential file containing quotes.

Post by csri »

Not sure if there is a better way but the following approach should help:

1. Use .csv or .txt file as the input file
2. set delimiter as "," (comma) and quote charater as "000" (no character) in the input sequential file stage
3. using a transformer remove the first and last quotes for each field using expression:
Right(Left(DSLink.Str, Len(DSLink.Str)-1),Len(DSLink.Str)-2)

In this example DSLink is the input link and Str is the filed. It worked for me in my test.

Hope this helps. :wink:
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

This sure seems like something DataStage should be able to do automatically. It is a common way to handle quotes in databases. And if a Microsuck tool can do it, then Datastage should be able to do it with it's eyes closed.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I believe it works in Server but not in PX for some reason. Search for topics on the subject by Phil Hibbs and his problems with a rusty 6" nail.
-craig

"You can never have too many knives" -- Logan Nine Fingers
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

I don't believe it's ever worked in Server version either. Still doesn't on 7.5.2
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... maybe it was the fact that he had a Server job that could produce the file but no PX job could read it. [shrug]
-craig

"You can never have too many knives" -- Logan Nine Fingers
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Post by hiteshm »

Thank to everyone for their comments.

With regards to csri suggestion, what would happen if the column contained commas?

eg,

Address
--------------
"5 My street, Big town"


I am still surprised why DataStage cannot automatically handle these situations!!!
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Post by hiteshm »

Hi,

Also just to add to my previous comments,

if the data contains quotes, then the csv file will be saved with double double quotes. Has shown is my first post.

Correct me if I'm wrong, but using crsi suggestions, then double double quotes would be loaded, instead of the original a single double quote.

(I hope the above makes sense!).
csri
Participant
Posts: 99
Joined: Wed Jul 11, 2007 8:02 am

Post by csri »

I tried writting to a sequential file and I have set the quote character to "000" on the target seqeuntial file too.

I had two fields in the input record:
input record: "A",""AA""""
output record: A,"AA"""

If you have a comma within the data then this approach doesn't work. :(
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Post by hiteshm »

I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-

awk 'BEGIN{FS=",";OFS=","}{gsub("\"\"","",$2)}1'

This script basically strips out the quotes within column 2.

Though this has following drawbacks :-

- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'

- The above script only processes column 2, therefore for each column the gsub function would need to be specified.

p.s. thank you Shankar for the script!
csri
Participant
Posts: 99
Joined: Wed Jul 11, 2007 8:02 am

Post by csri »

If your second field value is ""New' Field2"" What will be your output? Does your solution handle if there are two double quotes at the begining or at the end of the filed value as in the example.
girish312
Participant
Posts: 3
Joined: Tue Sep 12, 2006 4:42 pm

Were you able to reslve the answer to schema def issue here?

Post by girish312 »

hiteshm wrote:I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-

awk 'BEGIN{FS=",";OFS=","}{gsub("""","",$2)}1'

This script basically strips out the quotes within column 2.

Though this has following drawbacks :-

- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'

- The above script only processes column 2, therefore for each column the gsub function would need to be specified.

p.s. thank you Shankar for the script!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Interesting first post.

girish312 - you need to include your questions in the body of the post, no-one can see your subject until they reply. And most won't even notice it, I'd wager.
girish312 wrote:Were you able to reslve the answer to schema def issue here?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sivanandha
Premium Member
Premium Member
Posts: 11
Joined: Sat Mar 12, 2011 2:31 am
Location: united states

Post by sivanandha »

HI, though its an old post - i found a solution and thought worth sharing.
Am in DS 8.5 (hope this works with 8.x)

Set the Sequential Stage format to comma separated without quotes.
next got to each column in that stage, double clicking that field will take you to column metadata.
for other fields (those which dont have issues) add field level quote=double
for fields having this issue set field level quote=none && String Type Character Set=ASCII


this worked great for me
Thanks!
Sivanandha
Post Reply