Need help with pivoting columns

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sagar deshmukh
Participant
Posts: 103
Joined: Fri Jan 18, 2008 12:55 am
Location: chennai

Need help with pivoting columns

Post by sagar deshmukh »

Hi I have a condition where....
from input i am getting two columns as below :-

ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP


HOW CAN I ACHIEVE THIS....???


I AM USING DS SERVER EDITION.

*Note: Title edited to be more descriptive - Content Editor*
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Did you try searching?

There are loads of similar (or identical) posts.

General Answer: Use stage variables
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

search for horizontal pivoting.............
Teradata Certified Master V2R5
kailas
Participant
Posts: 21
Joined: Mon Nov 17, 2008 11:49 pm
Location: bangalore

Re: Need a help

Post by kailas »

Use the following unix script .This script will satisfy ur requirement. Make the output of this script i.e (outputFile.txt) as input in the data stage job.
Here InputFile.txt will be
ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP



awk -F ' *, *' ' END {
while (++i <= c) print order[i], row[order[i]]
}
{
row[$1] = row[$1] ? row[$1] OFS $2 : $2
while (!temp[$1]++) order[++c] = $1
}' InputFile.txt > outputFile.txt
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ur was a city in ancient Babylon.

The second person possessive pronoun in English is spelled "your", not "ur". Please strive for a professional standard of written English on DSXchange, because even good English is difficult enough for those whose first language is not English. Further, DSXchange is not an SMS-enabled device.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DaleK
Premium Member
Premium Member
Posts: 68
Joined: Fri Jun 27, 2003 8:33 am
Location: Orlando

Post by DaleK »

since we have limited number of programmers with UNIX scripting knowledge.

We use a hash file as a solution to this type of problem.

the flat file is read in and it looks up to a hash file that has column 1 as the key. If key found on hash file then we write out to the same hash file where the second column derivation is <Hasfile 1.column 2> : '#' : Input file.column2

Note the look up hash file is the same hash file as the output in this job.
we use an empty file in the job to create the empty lookup hash file

Works fine for us the couple of times we need to use it and the process was contained in the job so oncall support person could see what was going on since running unix scripts from datastage is somewhat hidden to the blurry eyed programmer at 3:00AM.

warning - this will be slower then running a unix script.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.
sagar deshmukh
Participant
Posts: 103
Joined: Fri Jan 18, 2008 12:55 am
Location: chennai

Post by sagar deshmukh »

Hi Sainath,

How it will work whhen the key is changed?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Code: Select all

yourStgField = if (prevKey = Key) 
then inField Else 
yourStgField : delim : inField 
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Post by girija »

ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP
=================================

transformer :

1. Define three stage variable

===> stgCount number intial value 0
===> stgApndVal vachar (...) initial value '~~~~~~~~~'
===> stgKey

2. Derivation

===> if link.ID <> stgKey THEN 1 ELSE stgCount + 1
===> if link.NAME <> stgKey THEN NAME ELSE stgApndVal:'#':NAME
===> link.ID

3. Add new output field APND_NAME and assign stgApndVal to it
4. Add new output field SRL_NO and assign stgCount to it

Code: Select all


    transformer -----> remove duplicate ---> target

remove duplicate stage partion same (since you already partion in ID at the input of transformer) but sort ID and SRL_NO , Key is ID and take the last.
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Post by girija »

Sorry, I won't see this is a Server Job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's because you hijacked someone else's thread rather than starting a new topic for your new question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Sainath.Srinivasan wrote:or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.
Sainath's reply gave me a few ideas...

My first attempt was to try doing the above without an intermediate hashed file. Instead, I used the RowProcGetPreviousValue and RowProcCompareWithPreviousValue transforms to detect when the key had changed.

The problems were:

1) When the key changed, I was always "one row ahead". I want to output when I'm on the last key in the group, not the first key in the next group.

2) This code didn't work for the last record, since I don't know the end-of-file until after the fact. See viewtopic.php?t=126410&highlight=

So I can see why Sainath uses a hashed file for output: unconditionally output all rows to the hashed file, overwriting each previous value with the concatenated stage variable, which is building up for each row. When the key changes, viola, your last.key and pivoted data was just saved to the hashed file.

However, I also decided to approach from a different tack...

I'm very familiar with SAS programming, which has the concept of first.key and last.key automatic variables. Since DS doesn't have these variables, I decided to construct them.

SAS also has the concept of "interleaving tables", which is a merge by row position rather than by keys. With this in mind, and by saving the Row Number to permanent storage, I was able to create "Look Ahead" and "Look Behind" variables (LookBehind is a bit redundant given RowProcGetPreviousValue). With these I can then derive first.key and last.key boolean flags. I can do this for sub-groups as well.

I've saved my proof of concept job (DSX file you can import) here:
http://docs.google.com/Doc?docid=dcdxxj ... v8dm&hl=en

I've saved the test data I used in this job here:
http://docs.google.com/Doc?docid=dcdxxj ... k3g5&hl=en. The test data is purposely stored unsorted.

This was really just a proof of concept, an academic exercise as much as anything. It might perform like a dog with large data volumes - I haven't tested it. But, it does solve the original poster's problem, and perhaps is a generic approach to first.key and last.key processing.

Or I may have just reinvented a perfectly round wheel if I'd searched the forums a bit harder :wink:

HTH...
Last edited by sbass1 on Wed Apr 08, 2009 9:14 pm, edited 1 time in total.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I have not been through your links so far. But do very much appreciate your effort.

Alternatively you can append a dummy line to incoming file, locating the presence of which in your input stream will determine end-of-file.
Post Reply