How to make a record for each comma separated field

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

How to make a record for each comma separated field

Post by Marley777 »

Hi, thanks for reading

Need to split a comma delimited record into rows.

If I have

field1,field2,field3

How can I end up with 3 records?

field1
field2
field3
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Horizontal Pivot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

Horizontal Pivot didn't work because it needs multiple columns. We have one field that contains

field1,field2,field3

How can we parse this one field and get 3 records

field1
field2
field3
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Parse the record into three columns when you read it, then pivot.
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you had said "comma delimited field" rather than "comma delimited record" right up front, I would have expanded a bit more on my answer. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

Sorry it's a comma delimted field not a comma delimited record.

we won't know the number of comma seperated fields. We just know that we have a comma delimited field and each comma delimted value within the field should be parsed into it's own record.


field1,field,field3 etc...

will need to be
field1
field2
field3
etc...
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

No worries.

Since there's not a fixed number of columns, some possibilities that come to mind:

- If you're using IS 8.5, a transformer could probably loop through the field and create the multiple records
- 8.1 or earlier: If there's a reasonable max number of columns in that field, you could use a transformer with constrained multiple outputs and funnel those together. Ugly but it works
- A BuildOp could handle this easily

Each of these actually do the pivot for you, so no need for a pivot stage.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could also continue down James' original path as long as you know what the max occurance would be. You then always generate that max number of fields, pivot out to the maximum number of records and then filter out an unneeded / null records post pivot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I considered that as well...easy enough with the transformer. My reservation with it is the overhead of the extra records to be dropped. Depends on the input volume and the ratio of average population to max population.

If this is a one-time job, the overhead may be worth the easier design.
- james wiles


All generalizations are false, including this one - Mark Twain.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

An awk solution will be more dynamic with less overhead.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

Thanks everyone for your ideas. Everyone was very helpful as usual. A person in our shop is good with PEARL, and is writing a program for what we need. We try not to do things outside of DataStage, but sometimes it makes sense to take another route.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well, for awk, you could have put the one liner in the filter command of the sequential file stage. No external scripts required.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

One Liner - what's the one line?...if you don't mind.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What column number is this in the entire record and how is the record delimited?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

The input record conatins one field and it would be a varchar 255 field. The one input field contains value1,value2,value3 etc.. we will never know ahead of time how many comma delimited values the input field contains. So each value in the field would become it's own record.


value1
value2
value3
etc..
Post Reply