Page 1 of 2

How to make a record for each comma separated field

Posted: Fri Feb 25, 2011 9:16 am
by Marley777
Hi, thanks for reading

Need to split a comma delimited record into rows.

If I have

field1,field2,field3

How can I end up with 3 records?

field1
field2
field3

Posted: Fri Feb 25, 2011 9:19 am
by chulett
Horizontal Pivot.

Posted: Fri Feb 25, 2011 9:21 am
by Marley777
Horizontal Pivot didn't work because it needs multiple columns. We have one field that contains

field1,field2,field3

How can we parse this one field and get 3 records

field1
field2
field3

Posted: Fri Feb 25, 2011 9:37 am
by jwiles
Parse the record into three columns when you read it, then pivot.

Posted: Fri Feb 25, 2011 9:42 am
by chulett
If you had said "comma delimited field" rather than "comma delimited record" right up front, I would have expanded a bit more on my answer. :wink:

Posted: Fri Feb 25, 2011 9:46 am
by Marley777
Sorry it's a comma delimted field not a comma delimited record.

we won't know the number of comma seperated fields. We just know that we have a comma delimited field and each comma delimted value within the field should be parsed into it's own record.


field1,field,field3 etc...

will need to be
field1
field2
field3
etc...

Posted: Fri Feb 25, 2011 9:57 am
by jwiles
No worries.

Since there's not a fixed number of columns, some possibilities that come to mind:

- If you're using IS 8.5, a transformer could probably loop through the field and create the multiple records
- 8.1 or earlier: If there's a reasonable max number of columns in that field, you could use a transformer with constrained multiple outputs and funnel those together. Ugly but it works
- A BuildOp could handle this easily

Each of these actually do the pivot for you, so no need for a pivot stage.

Regards,

Posted: Fri Feb 25, 2011 10:00 am
by chulett
You could also continue down James' original path as long as you know what the max occurance would be. You then always generate that max number of fields, pivot out to the maximum number of records and then filter out an unneeded / null records post pivot.

Posted: Fri Feb 25, 2011 10:25 am
by jwiles
I considered that as well...easy enough with the transformer. My reservation with it is the overhead of the extra records to be dropped. Depends on the input volume and the ratio of average population to max population.

If this is a one-time job, the overhead may be worth the easier design.

Posted: Fri Feb 25, 2011 11:24 am
by DSguru2B
An awk solution will be more dynamic with less overhead.

Posted: Fri Feb 25, 2011 11:45 am
by Marley777
Thanks everyone for your ideas. Everyone was very helpful as usual. A person in our shop is good with PEARL, and is writing a program for what we need. We try not to do things outside of DataStage, but sometimes it makes sense to take another route.

Posted: Fri Feb 25, 2011 12:53 pm
by DSguru2B
Well, for awk, you could have put the one liner in the filter command of the sequential file stage. No external scripts required.

Posted: Fri Feb 25, 2011 1:05 pm
by Marley777
One Liner - what's the one line?...if you don't mind.

Posted: Fri Feb 25, 2011 1:07 pm
by DSguru2B
What column number is this in the entire record and how is the record delimited?

Posted: Fri Feb 25, 2011 1:22 pm
by Marley777
The input record conatins one field and it would be a varchar 255 field. The one input field contains value1,value2,value3 etc.. we will never know ahead of time how many comma delimited values the input field contains. So each value in the field would become it's own record.


value1
value2
value3
etc..