Vertical Pivot Logic
Moderators: chulett, rschirm, roy
Vertical Pivot Logic
Hi
I want to achieve the following logic in my job:
Input Data:
Col1 Col2
A Value1
A Value2
A Value3
B Value4
B Value5
C Value6
C Value7
C Value8
C Value9
Output Data:
Col1 col2 col3 col4 col5
A Value1 Value2 Value3
B Value4 Value5
C Value6 Value7 Value8 Value9
I tried to do vertical pivot using Transfomer but was not successfull. Can anyone tell me how can I achieve this logic.
Thanks
I want to achieve the following logic in my job:
Input Data:
Col1 Col2
A Value1
A Value2
A Value3
B Value4
B Value5
C Value6
C Value7
C Value8
C Value9
Output Data:
Col1 col2 col3 col4 col5
A Value1 Value2 Value3
B Value4 Value5
C Value6 Value7 Value8 Value9
I tried to do vertical pivot using Transfomer but was not successfull. Can anyone tell me how can I achieve this logic.
Thanks
Regards
Kashif Khan
Kashif Khan
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Partition and sort by the key column, and generate a key change column in the Sort stage. Downstream of that, accumulate your list of non-key columns in stage variables, initializing to null on key change. Downstream of that select the last of each group using a Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Ray described how to use a transformer to perform the pivot you require. Did you follow the instructions, particularly the part quoted above, closely?... accumulate your list of non-key columns in stage variables, initializing to null on key change.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Here's a summary of how your stage variables would be accumulated over the course of your example input data: ('-' means NULL)
You'll see how using the 'keep last' approach in the Remove Duplicates stage deliveres the output you require.
J.
Code: Select all
INPUT COLUMNS STAGE VARIABLES
---------------------- --------------------------------------
Key Value keyChange svCol1 svCol2 svCol3 svCol4 svCol5
A Value1 1 Value1 - - - -
A Value2 0 Value1 Value2 - - -
A Value3 0 Value1 Value2 Value3 - -
B Value4 1 Value4 - - - -
B Value5 0 Value4 Value5 - - -
C Value6 1 Value6 - - - -
C Value7 0 Value6 Value7 - - -
C Value8 0 Value6 Value7 Value8 - -
C Value9 0 Value6 Value7 Value8 Value9 -
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
The maximum possible occurance of Non-key Col2 would be 15. So I created 15 stage variables and in each stage variable I have the following logic:
If keyChange = '1' Then SetNull() Else Col2
Then I map them to all 15 output columns in the transformer. After the remove duplicate logic, I get the same data in all the 15 columns. Following is an example:
col1(key) col2(sv1) col3(sv2) col4(sv3) col5(sv4) col6(sv5) col7(sv6).................till 15 occurance
A Value3 Value3 Value3 Value3 Value3 Value3
B Value5 Value5 Value5 Value5 Value5 Value5
C Value9 Value9 Value9 Value9 Value9 Value9
One value pertains for all the occurances. What you have drawn in the picture is absolutely correct but I am not able to apply the logic as you have described. I am still missing something, please advice.
Thanks
If keyChange = '1' Then SetNull() Else Col2
Then I map them to all 15 output columns in the transformer. After the remove duplicate logic, I get the same data in all the 15 columns. Following is an example:
col1(key) col2(sv1) col3(sv2) col4(sv3) col5(sv4) col6(sv5) col7(sv6).................till 15 occurance
A Value3 Value3 Value3 Value3 Value3 Value3
B Value5 Value5 Value5 Value5 Value5 Value5
C Value9 Value9 Value9 Value9 Value9 Value9
One value pertains for all the occurances. What you have drawn in the picture is absolutely correct but I am not able to apply the logic as you have described. I am still missing something, please advice.
Thanks
Regards
Kashif Khan
Kashif Khan
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Try the variables below. I've put some absurd indenting in to try and make the logic clearer. I'm not guaranteeing this to be the optimum solution, just something that should work
This (hopefully!) delivers:
This is off the top of my head so no warranty implied ...
J.
Code: Select all
svColId = If inLink.KeyChange Then 1 Else svColId + 1
svCol1 = If keyChange Then inLink.Value Else svCol1
svCol2 = If inLink.keyChange
Then SetNull()
Else If svColId=2
Then inLink.Value
Else svCol2
svCol3 = If inLink.keyChange
Then SetNull()
Else If svColId=3
Then inLink.Value
Else svCol3
etc.
Code: Select all
INPUT COLUMNS STAGE VARIABLES
---------------------- ----------------------------------------------
Key Value keyChange svColId svCol1 svCol2 svCol3 svCol4 svCol5
A Value1 1 1 Value1 - - - -
A Value2 0 2 Value1 Value2 - - -
A Value3 0 3 Value1 Value2 Value3 - -
B Value4 1 1 Value4 - - - -
B Value5 0 2 Value4 Value5 - - -
C Value6 1 1 Value6 - - - -
C Value7 0 2 Value6 Value7 - - -
C Value8 0 3 Value6 Value7 Value8 - -
C Value9 0 4 Value6 Value7 Value8 Value9 -
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
You could also do it by replace the transform suggested above with a transform containing only 3 or 4 stage variables and a downstream column import stage.
You would still need to know the max number of output columns though.
Set up stage variables in your transform to:
1. Count the number of incoming columns per key value (using the key change. Eg svCount = If Link.keyChange then 1 else svCount+1 (initialise the stage variable to 1)
2. Build the number of columns. eg svBuildColumns = If Link.keyChange then Link.PivotValue else svBuildColumns :'|': Link.PivotValue - where '|' is your delimiter of choice
3. An output stage variable. Because the column import stage expects the number of delimiters to be consistent, you would need to build a consistent string of columns by padding the output with your delimiter value. eg svOutput = svBuildColumns : str('|', MaxColumns - svCount)
Downstream from the transform set up a column import stage which passes through the key values, and splits your pivoted output column from the transform into separate columns based on the delimiter you chose.
Not sure that this design has any better performance, but I think setting it up is easier, especially when you have a larger number of columns to pivot.
You would still need to know the max number of output columns though.
Set up stage variables in your transform to:
1. Count the number of incoming columns per key value (using the key change. Eg svCount = If Link.keyChange then 1 else svCount+1 (initialise the stage variable to 1)
2. Build the number of columns. eg svBuildColumns = If Link.keyChange then Link.PivotValue else svBuildColumns :'|': Link.PivotValue - where '|' is your delimiter of choice
3. An output stage variable. Because the column import stage expects the number of delimiters to be consistent, you would need to build a consistent string of columns by padding the output with your delimiter value. eg svOutput = svBuildColumns : str('|', MaxColumns - svCount)
Downstream from the transform set up a column import stage which passes through the key values, and splits your pivoted output column from the transform into separate columns based on the delimiter you chose.
Not sure that this design has any better performance, but I think setting it up is easier, especially when you have a larger number of columns to pivot.
-
- Premium Member
- Posts: 72
- Joined: Thu Sep 04, 2003 5:01 am
- Location: UK & Europe
Another method ...
Another way to skin the Vertical Pivot cat...
Sort and add a KeyChange column as in previous suggestions.
Add a transformer that increments a stage variable and resets on key change giving a Counter field in the output:
If KeyChange=1 then svCOunt = 1 Else svCount = svCount + 1
Add a switch stage with as many output links as the maximum instances of Value that you need. This will Switch on the counter column and send first row down link 1, second down link 2 etc. Also renames the Value field to Value1 in Link1, Value2 in Link2 etc.
All outputs from the Switch feed into a Merge that merges on your key and moves
Link1.Value1 -> Value1,
Link2.Value2 -> Value2
etc.
Not claiming any particular performance benefits for this technique, but it does not generate any supernumary rows that need to be deduped out, and does more of the work using native EE Stages rather than a parallel Transformer so might be worth a try.
Phil Clarke.
Sort and add a KeyChange column as in previous suggestions.
Add a transformer that increments a stage variable and resets on key change giving a Counter field in the output:
If KeyChange=1 then svCOunt = 1 Else svCount = svCount + 1
Add a switch stage with as many output links as the maximum instances of Value that you need. This will Switch on the counter column and send first row down link 1, second down link 2 etc. Also renames the Value field to Value1 in Link1, Value2 in Link2 etc.
All outputs from the Switch feed into a Merge that merges on your key and moves
Link1.Value1 -> Value1,
Link2.Value2 -> Value2
etc.
Not claiming any particular performance benefits for this technique, but it does not generate any supernumary rows that need to be deduped out, and does more of the work using native EE Stages rather than a parallel Transformer so might be worth a try.
Phil Clarke.