Pivoting rows with data in the column header

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Pivoting rows with data in the column header

Post by UAUITSBI »

Hello,

I have a source file (Tab delimited) with the below format:

Code: Select all

Year       Section      Question   Question_Nbr    41     271     81
1314       TIME          XYZ               Q1           67%   70%   55%
1314       TIME          ABC               Q2           63%   40%   45%
1314       TIME          DEF               Q3           57%   70%   65%



Desired Output:

Code: Select all

Year       Section      Question   Question_Nbr    SITE_NBR     RESPONSE
1314       TIME          XYZ                Q1                 41                67%
1314       TIME          XYZ                Q2                 41                63%
1314       TIME          XYZ                Q3                 41                57%
1314       TIME          ABC                Q1                 271              70%
1314       TIME          ABC                Q2                 271              40%
1314       TIME          ABC                Q3                 271              70%
1314       TIME          DEF                Q1                 81                55%
1314       TIME          DEF                Q2                 81                45%
1314       TIME          DEF                Q3                 81                65%




In the source file, the data is included in the header meaning 41, 271, 81 are site numbers and are part of data not the columns.

Inorder to derive the desire output as I don't know number of columns I used DCOUNT to determine the columns, used the @ITERATION variable to determine the loop number from DCOUNT and then used Field function to derive the required fields but was not able to relate the Site numbers (41, 271, 81) to their relative questions and couldn't accomplish the desired output as shown above.

This is just a sample data in the source file I have about 250 columns which are site numbers and there will be source files in the future with unknown number of columns.

Please advice.
Any suggestions are appreciated.
Thanks in advance !!
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi,

Basically, you need to add the header row to each row of data and then you can manipulate however you like.

Read the file with the "First Line is Column Names" set to false.

Use a transformer to recognise the header row with something like Year = "Year" and send this row as a reference link to a look up stage.

The other rows are the source link in to the lookup. Use a column called "lkup_key" in both feeds and set the value to "1" to join the rows in the look up stage.

You can then map the required header rows in the lookup stage and pivot the data in a subsequent transformer.

Hope that helps.

Bob.
Bob Oxtoby
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Post by UAUITSBI »

Hi Bob,

Thanks for the suggestion. But before getting to the solution you suggested I am stuck at dividing this one big column into individual columns by using the delimiter (tab).

The issue is as the number of columns are unknown I am reading the entire row into one column and then used DCOUNT function to determine number of columns, passed the count to the @ITERATION variable

Code: Select all

 @ITERATION <= DCOUNT 
and then used:
To determine the 1st column

Code: Select all

 Field(DSLink2.ABC, char(9), @ITERATION, 1) 

To determine the 2nd column

Code: Select all

 Field(DSLink2.ABC, char(9), @ITERATION+1, 1) 


As there would be n number of columns (In the current source file = 250 columns) I am not sure if this is the right method. Is there a dynamic way to derive the columns ?

Please let me know if I am over looking something in your suggestion.

Thanks !!
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

1: Define two Stage Variables as svLoopControl and svColumns
2: svColumns = Delimited Column names and svLoopControl = Dcount function to get the total number of columns from svColumns with Delimiter Value.

3: Define two Loop Variable as lvCount and lvColumns.
4: lvCount = lvCount + 1 and lvColumns = (Field(svColumns,'Delimiter Value',ivCount))

5: In Iteration Loop While = lvCount =< svLoopcontrol

Hope this helps.
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Post by UAUITSBI »

Thanks for the reply mobashshar. I was implementing these steps before but was not able to generate the columns dynamically so that there would be no need to relate back to the responses manually as I have about 250 columns in each file and this number can vary per source file.

I have achieved this by following Bob's suggestion and then used Pivot stage to pivot the columns into rows. But had to deal with all the columns manually I was hoping if there would be some RCP logic that could be incorporated.

Thanks again to both of you for the help !!
Post Reply