How to achieve logic for horizontal pivot

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

How to achieve logic for horizontal pivot

Post by skp »

Hi,

My source is like below:
Col1 Col2
a b
c d
e f
g h

Output should be like below:
Column
a|c|e|g
b|d|f|h

Help me on this please.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

What exact release are you on? (functionality differs)
Are there any other columns in the record? If so, what should happen with them?
How many records are there? Some solutions might not work if you had millions to process.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Post by skp »

There are no other columns only two columns we will get from source but output should be one column but two different records 1 for first column and second one for second column
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Again, what revision are you on? How many records approximately?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Post by skp »

working on 9.1 version and there is no limit has been given by business for the number of records
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

1) Setup two LongVarChar stage variables in a transformer, both initialized to empty string ("").

2) As each record comes in, use the Field function to strip out the correct portion and append it to the existing Stage Variable.

3) Put the two stage variables on separate output links, and put a constraint of LastRow() on the links so they only output one record at the very end. You can combine them back into one stream with a Funnel stage (set to "ordered" so they stay in order).

I'm still concerned that if you have a ton of records, you'll blow through the limits of a single variable. If that happens I'd probably look at using UNIX commands to reformat the data. You might also be able to "build" a sequential file with the correct Record options, but I don't have time to look into that.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Post by skp »

Thanks a lot Andy... I will try this and let you know any problems...
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Post by skp »

Hi Andy,

how to stri out the correct portion using field function please help me on this.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If your In.Record = "a b"

Then Field(In.Record,' ',1) will be "a"
and Field(In.Record,' ',2) will be "b".

The first parameter is the string,
the second parameter is the field delimiter,
and the third parameter is the field number to extract.
skp
Premium Member
Premium Member
Posts: 135
Joined: Wed Dec 26, 2007 1:56 am
Location: India

Post by skp »

Andy,

My input record like below:

Col1
a
b
c
d


each record come in next line not in the same line.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Build the output lines in VarChar stage variables.
Output them on separate output links keeping only the final version of each, using either LastRecordInGroup() function or downstream RemoveDuplicates stages. Base these on a "generated" constant "key".
Bring them together using a Funnel stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Try awk based unix script .

This awk script will work even if your source file has an arbitrary number of columns ( you currently have 2 delimited by what i think is a space character)

Code: Select all

 awk -F" " 
  '{for (f = 1; f <= NF; f++) a[NR, f] = $f} 
   NF > nf { nf = NF } 
   END 
   { for (f = 1; f <= nf; f++) 
      for (r = 1; r <= NR; r++) 
      printf a[r, f] (r==NR ? RS : FS)}'  <YourSourceFileName>
You can call this script as a filter command in Sequential File stage or create a script and call it with your file name as argument in an External source stage.

Thanks
Ramesh
Post Reply