Vertical Pivot Logic

kashif007 · Post by **kashif007** » Wed May 07, 2008 2:51 pm

Hi

I want to achieve the following logic in my job:

Input Data:

Col1 Col2
A Value1
A Value2
A Value3
B Value4
B Value5
C Value6
C Value7
C Value8
C Value9

Output Data:

Col1 col2 col3 col4 col5
A Value1 Value2 Value3
B Value4 Value5
C Value6 Value7 Value8 Value9

I tried to do vertical pivot using Transfomer but was not successfull. Can anyone tell me how can I achieve this logic.
Thanks

ray.wurlod · Post by **ray.wurlod** » Wed May 07, 2008 4:08 pm

Partition and sort by the key column, and generate a key change column in the Sort stage. Downstream of that, accumulate your list of non-key columns in stage variables, initializing to null on key change. Downstream of that select the last of each group using a Remove Duplicates stage.

wahi80 · Post by **wahi80** » Wed May 07, 2008 4:37 pm

Another workaround could be to write a query to convert the columns to rows

kashif007 · Post by **kashif007** » Wed May 07, 2008 5:53 pm

I did as you said Ray. But your process only removes the duplicates. But I still can't figure out how do I convert the vertical rows into horizontal columns.

jhmckeever · Post by **jhmckeever** » Wed May 07, 2008 6:38 pm

... accumulate your list of non-key columns in stage variables, initializing to null on key change.

Ray described how to use a transformer to perform the pivot you require. Did you follow the instructions, particularly the part quoted above, closely?

jhmckeever · Post by **jhmckeever** » Wed May 07, 2008 6:55 pm

Here's a summary of how your stage variables would be accumulated over the course of your example input data: ('-' means NULL)

Code: Select all

INPUT COLUMNS           STAGE VARIABLES
----------------------  --------------------------------------
Key  Value   keyChange  svCol1  svCol2  svCol3  svCol4  svCol5
A    Value1  1          Value1  -       -       -       -
A    Value2  0          Value1  Value2  -       -       -
A    Value3  0          Value1  Value2  Value3  -       -
B    Value4  1          Value4  -       -       -       -
B    Value5  0          Value4  Value5  -       -       -
C    Value6  1          Value6  -       -       -       -
C    Value7  0          Value6  Value7  -       -       -
C    Value8  0          Value6  Value7  Value8  -       -
C    Value9  0          Value6  Value7  Value8  Value9  -

You'll see how using the 'keep last' approach in the Remove Duplicates stage deliveres the output you require.

J.

kashif007 · Post by **kashif007** » Wed May 07, 2008 9:25 pm

The maximum possible occurance of Non-key Col2 would be 15. So I created 15 stage variables and in each stage variable I have the following logic:

If keyChange = '1' Then SetNull() Else Col2

Then I map them to all 15 output columns in the transformer. After the remove duplicate logic, I get the same data in all the 15 columns. Following is an example:

col1(key) col2(sv1) col3(sv2) col4(sv3) col5(sv4) col6(sv5) col7(sv6).................till 15 occurance
A Value3 Value3 Value3 Value3 Value3 Value3
B Value5 Value5 Value5 Value5 Value5 Value5
C Value9 Value9 Value9 Value9 Value9 Value9

One value pertains for all the occurances. What you have drawn in the picture is absolutely correct but I am not able to apply the logic as you have described. I am still missing something, please advice.

Thanks

jhmckeever · Post by **jhmckeever** » Wed May 07, 2008 10:25 pm

Try the variables below. I've put some absurd indenting in to try and make the logic clearer. I'm not guaranteeing this to be the optimum solution, just something that should work

Code: Select all

svColId = If inLink.KeyChange Then 1 Else svColId + 1

svCol1 = If keyChange Then inLink.Value Else svCol1

svCol2 = If   inLink.keyChange 
         Then SetNull() 
         Else If   svColId=2
              Then inLink.Value 
              Else svCol2

svCol3 = If   inLink.keyChange 
         Then SetNull() 
         Else If   svColId=3
              Then inLink.Value 
              Else svCol3

etc.

This (hopefully!) delivers:

Code: Select all

INPUT COLUMNS           STAGE VARIABLES
----------------------  ----------------------------------------------
Key  Value   keyChange  svColId svCol1  svCol2  svCol3  svCol4  svCol5
A    Value1  1          1       Value1  -       -       -       -
A    Value2  0          2       Value1  Value2  -       -       -
A    Value3  0          3       Value1  Value2  Value3  -       -
B    Value4  1          1       Value4  -       -       -       -
B    Value5  0          2       Value4  Value5  -       -       -
C    Value6  1          1       Value6  -       -       -       -
C    Value7  0          2       Value6  Value7  -       -       -
C    Value8  0          3       Value6  Value7  Value8  -       -
C    Value9  0          4       Value6  Value7  Value8  Value9  -

This is off the top of my head so no warranty implied ...

J.

kashif007 · Post by **kashif007** » Thu May 08, 2008 12:13 pm

Brilliant Brilliant Brilliant. Works beautifully. Thanks

ShaneMuir · Post by **ShaneMuir** » Thu May 08, 2008 11:56 pm

You could also do it by replace the transform suggested above with a transform containing only 3 or 4 stage variables and a downstream column import stage.

You would still need to know the max number of output columns though.

Set up stage variables in your transform to:
1. Count the number of incoming columns per key value (using the key change. Eg svCount = If Link.keyChange then 1 else svCount+1 (initialise the stage variable to 1)
2. Build the number of columns. eg svBuildColumns = If Link.keyChange then Link.PivotValue else svBuildColumns :'|': Link.PivotValue - where '|' is your delimiter of choice
3. An output stage variable. Because the column import stage expects the number of delimiters to be consistent, you would need to build a consistent string of columns by padding the output with your delimiter value. eg svOutput = svBuildColumns : str('|', MaxColumns - svCount)

Downstream from the transform set up a column import stage which passes through the key values, and splits your pivoted output column from the transform into separate columns based on the delimiter you chose.

Not sure that this design has any better performance, but I think setting it up is easier, especially when you have a larger number of columns to pivot.

SettValleyConsulting · Mon May 12, 2008 9:57 am

Another way to skin the Vertical Pivot cat...

Sort and add a KeyChange column as in previous suggestions.

Add a transformer that increments a stage variable and resets on key change giving a Counter field in the output:

If KeyChange=1 then svCOunt = 1 Else svCount = svCount + 1

Add a switch stage with as many output links as the maximum instances of Value that you need. This will Switch on the counter column and send first row down link 1, second down link 2 etc. Also renames the Value field to Value1 in Link1, Value2 in Link2 etc.

All outputs from the Switch feed into a Merge that merges on your key and moves

Link1.Value1 -> Value1,
Link2.Value2 -> Value2

etc.

Not claiming any particular performance benefits for this technique, but it does not generate any supernumary rows that need to be deduped out, and does more of the work using native EE Stages rather than a parallel Transformer so might be worth a try.

Phil Clarke.

DSXchange

Vertical Pivot Logic

Vertical Pivot Logic

Another method ...