Vertical Pivot Logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Vertical Pivot Logic

Post by kashif007 »

Hi

I want to achieve the following logic in my job:

Input Data:

Col1 Col2
A Value1
A Value2
A Value3
B Value4
B Value5
C Value6
C Value7
C Value8
C Value9

Output Data:

Col1 col2 col3 col4 col5
A Value1 Value2 Value3
B Value4 Value5
C Value6 Value7 Value8 Value9

I tried to do vertical pivot using Transfomer but was not successfull. Can anyone tell me how can I achieve this logic.
Thanks
Regards
Kashif Khan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Partition and sort by the key column, and generate a key change column in the Sort stage. Downstream of that, accumulate your list of non-key columns in stage variables, initializing to null on key change. Downstream of that select the last of each group using a Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
wahi80
Participant
Posts: 214
Joined: Thu Feb 07, 2008 4:37 pm

Post by wahi80 »

Another workaround could be to write a query to convert the columns to rows
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

I did as you said Ray. But your process only removes the duplicates. But I still can't figure out how do I convert the vertical rows into horizontal columns.
Regards
Kashif Khan
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

... accumulate your list of non-key columns in stage variables, initializing to null on key change.
Ray described how to use a transformer to perform the pivot you require. Did you follow the instructions, particularly the part quoted above, closely?
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

Here's a summary of how your stage variables would be accumulated over the course of your example input data: ('-' means NULL)

Code: Select all

INPUT COLUMNS           STAGE VARIABLES
----------------------  --------------------------------------
Key  Value   keyChange  svCol1  svCol2  svCol3  svCol4  svCol5
A    Value1  1          Value1  -       -       -       -
A    Value2  0          Value1  Value2  -       -       -
A    Value3  0          Value1  Value2  Value3  -       -
B    Value4  1          Value4  -       -       -       -
B    Value5  0          Value4  Value5  -       -       -
C    Value6  1          Value6  -       -       -       -
C    Value7  0          Value6  Value7  -       -       -
C    Value8  0          Value6  Value7  Value8  -       -
C    Value9  0          Value6  Value7  Value8  Value9  -
You'll see how using the 'keep last' approach in the Remove Duplicates stage deliveres the output you require.

J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

The maximum possible occurance of Non-key Col2 would be 15. So I created 15 stage variables and in each stage variable I have the following logic:

If keyChange = '1' Then SetNull() Else Col2

Then I map them to all 15 output columns in the transformer. After the remove duplicate logic, I get the same data in all the 15 columns. Following is an example:

col1(key) col2(sv1) col3(sv2) col4(sv3) col5(sv4) col6(sv5) col7(sv6).................till 15 occurance
A Value3 Value3 Value3 Value3 Value3 Value3
B Value5 Value5 Value5 Value5 Value5 Value5
C Value9 Value9 Value9 Value9 Value9 Value9

One value pertains for all the occurances. What you have drawn in the picture is absolutely correct but I am not able to apply the logic as you have described. I am still missing something, please advice.

Thanks
Regards
Kashif Khan
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

Try the variables below. I've put some absurd indenting in to try and make the logic clearer. I'm not guaranteeing this to be the optimum solution, just something that should work :-)

Code: Select all

svColId = If inLink.KeyChange Then 1 Else svColId + 1

svCol1 = If keyChange Then inLink.Value Else svCol1

svCol2 = If   inLink.keyChange 
         Then SetNull() 
         Else If   svColId=2
              Then inLink.Value 
              Else svCol2

svCol3 = If   inLink.keyChange 
         Then SetNull() 
         Else If   svColId=3
              Then inLink.Value 
              Else svCol3

etc.
This (hopefully!) delivers:

Code: Select all

INPUT COLUMNS           STAGE VARIABLES
----------------------  ----------------------------------------------
Key  Value   keyChange  svColId svCol1  svCol2  svCol3  svCol4  svCol5
A    Value1  1          1       Value1  -       -       -       -
A    Value2  0          2       Value1  Value2  -       -       -
A    Value3  0          3       Value1  Value2  Value3  -       -
B    Value4  1          1       Value4  -       -       -       -
B    Value5  0          2       Value4  Value5  -       -       -
C    Value6  1          1       Value6  -       -       -       -
C    Value7  0          2       Value6  Value7  -       -       -
C    Value8  0          3       Value6  Value7  Value8  -       -
C    Value9  0          4       Value6  Value7  Value8  Value9  -
This is off the top of my head so no warranty implied ... :-)

J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

Brilliant Brilliant Brilliant. Works beautifully. Thanks
Regards
Kashif Khan
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

You could also do it by replace the transform suggested above with a transform containing only 3 or 4 stage variables and a downstream column import stage.

You would still need to know the max number of output columns though.

Set up stage variables in your transform to:
1. Count the number of incoming columns per key value (using the key change. Eg svCount = If Link.keyChange then 1 else svCount+1 (initialise the stage variable to 1)
2. Build the number of columns. eg svBuildColumns = If Link.keyChange then Link.PivotValue else svBuildColumns :'|': Link.PivotValue - where '|' is your delimiter of choice
3. An output stage variable. Because the column import stage expects the number of delimiters to be consistent, you would need to build a consistent string of columns by padding the output with your delimiter value. eg svOutput = svBuildColumns : str('|', MaxColumns - svCount)

Downstream from the transform set up a column import stage which passes through the key values, and splits your pivoted output column from the transform into separate columns based on the delimiter you chose.

Not sure that this design has any better performance, but I think setting it up is easier, especially when you have a larger number of columns to pivot.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Another method ...

Post by SettValleyConsulting »

Another way to skin the Vertical Pivot cat...

Sort and add a KeyChange column as in previous suggestions.

Add a transformer that increments a stage variable and resets on key change giving a Counter field in the output:

If KeyChange=1 then svCOunt = 1 Else svCount = svCount + 1

Add a switch stage with as many output links as the maximum instances of Value that you need. This will Switch on the counter column and send first row down link 1, second down link 2 etc. Also renames the Value field to Value1 in Link1, Value2 in Link2 etc.

All outputs from the Switch feed into a Merge that merges on your key and moves

Link1.Value1 -> Value1,
Link2.Value2 -> Value2

etc.


Not claiming any particular performance benefits for this technique, but it does not generate any supernumary rows that need to be deduped out, and does more of the work using native EE Stages rather than a parallel Transformer so might be worth a try.


Phil Clarke.
Post Reply