Need help with pivoting columns

sagar deshmukh · Post by **sagar deshmukh** » Mon Apr 06, 2009 4:49 am

Hi I have a condition where....
from input i am getting two columns as below :-

ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP

HOW CAN I ACHIEVE THIS....???

I AM USING DS SERVER EDITION.

*Note: Title edited to be more descriptive - Content Editor*

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Apr 06, 2009 4:53 am

Did you try searching?

There are loads of similar (or identical) posts.

General Answer: Use stage variables

hamzaqk · Post by **hamzaqk** » Mon Apr 06, 2009 5:35 am

search for horizontal pivoting.............

kailas · Post by **kailas** » Mon Apr 06, 2009 6:10 am

Use the following unix script .This script will satisfy ur requirement. Make the output of this script i.e (outputFile.txt) as input in the data stage job.
Here InputFile.txt will be
ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

awk -F ' *, *' ' END {
while (++i <= c) print order[i], row[order[i]]
}
{
row[$1] = row[$1] ? row[$1] OFS $2 : $2
while (!temp[$1]++) order[++c] = $1
}' InputFile.txt > outputFile.txt

ray.wurlod · Post by **ray.wurlod** » Mon Apr 06, 2009 7:22 am

Ur was a city in ancient Babylon.

The second person possessive pronoun in English is spelled "your", not "ur". Please strive for a professional standard of written English on DSXchange, because even good English is difficult enough for those whose first language is not English. Further, DSXchange is not an SMS-enabled device.

DaleK · Post by **DaleK** » Mon Apr 06, 2009 1:54 pm

since we have limited number of programmers with UNIX scripting knowledge.

We use a hash file as a solution to this type of problem.

the flat file is read in and it looks up to a hash file that has column 1 as the key. If key found on hash file then we write out to the same hash file where the second column derivation is <Hasfile 1.column 2> : '#' : Input file.column2

Note the look up hash file is the same hash file as the output in this job.
we use an empty file in the job to create the empty lookup hash file

Works fine for us the couple of times we need to use it and the process was contained in the job so oncall support person could see what was going on since running unix scripts from datastage is somewhat hidden to the blurry eyed programmer at 3:00AM.

warning - this will be slower then running a unix script.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Apr 07, 2009 6:12 am

or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.

sagar deshmukh · Post by **sagar deshmukh** » Tue Apr 07, 2009 6:34 am

Hi Sainath,

How it will work whhen the key is changed?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Apr 07, 2009 10:07 am

Code: Select all

yourStgField = if (prevKey = Key) 
then inField Else 
yourStgField : delim : inField

girija · Post by **girija** » Tue Apr 07, 2009 12:00 pm

ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP
=================================

transformer :

1. Define three stage variable

===> stgCount number intial value 0
===> stgApndVal vachar (...) initial value '~~~~~~~~~'
===> stgKey

2. Derivation

===> if link.ID <> stgKey THEN 1 ELSE stgCount + 1
===> if link.NAME <> stgKey THEN NAME ELSE stgApndVal:'#':NAME
===> link.ID

3. Add new output field APND_NAME and assign stgApndVal to it
4. Add new output field SRL_NO and assign stgCount to it

Code: Select all


    transformer -----> remove duplicate ---> target

remove duplicate stage partion same (since you already partion in ID at the input of transformer) but sort ID and SRL_NO , Key is ID and take the last.

girija · Post by **girija** » Tue Apr 07, 2009 12:40 pm

Sorry, I won't see this is a Server Job.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 07, 2009 1:17 pm

That's because you hijacked someone else's thread rather than starting a new topic for your new question.

sbass1 · Post by **sbass1** » Wed Apr 08, 2009 4:12 am

Sainath.Srinivasan wrote:or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.

Sainath's reply gave me a few ideas...

My first attempt was to try doing the above without an intermediate hashed file. Instead, I used the RowProcGetPreviousValue and RowProcCompareWithPreviousValue transforms to detect when the key had changed.

The problems were:

1) When the key changed, I was always "one row ahead". I want to output when I'm on the last key in the group, not the first key in the next group.

2) This code didn't work for the last record, since I don't know the end-of-file until after the fact. See viewtopic.php?t=126410&highlight=

So I can see why Sainath uses a hashed file for output: unconditionally output all rows to the hashed file, overwriting each previous value with the concatenated stage variable, which is building up for each row. When the key changes, viola, your last.key and pivoted data was just saved to the hashed file.

However, I also decided to approach from a different tack...

I'm very familiar with SAS programming, which has the concept of first.key and last.key automatic variables. Since DS doesn't have these variables, I decided to construct them.

SAS also has the concept of "interleaving tables", which is a merge by row position rather than by keys. With this in mind, and by saving the Row Number to permanent storage, I was able to create "Look Ahead" and "Look Behind" variables (LookBehind is a bit redundant given RowProcGetPreviousValue). With these I can then derive first.key and last.key boolean flags. I can do this for sub-groups as well.

I've saved my proof of concept job (DSX file you can import) here:
http://docs.google.com/Doc?docid=dcdxxj ... v8dm&hl=en

I've saved the test data I used in this job here:
http://docs.google.com/Doc?docid=dcdxxj ... k3g5&hl=en. The test data is purposely stored unsorted.

This was really just a proof of concept, an academic exercise as much as anything. It might perform like a dog with large data volumes - I haven't tested it. But, it does solve the original poster's problem, and perhaps is a generic approach to first.key and last.key processing.

Or I may have just reinvented a perfectly round wheel if I'd searched the forums a bit harder

HTH...

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Apr 08, 2009 4:59 am

I have not been through your links so far. But do very much appreciate your effort.

Alternatively you can append a dummy line to incoming file, locating the presence of which in your input stream will determine end-of-file.

Need help with pivoting columns

Need help with pivoting columns

Re: Need a help