DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
kpushkaraj
Participant



Joined: 20 Feb 2011
Posts: 4

Points: 35

Post Posted: Tue Dec 06, 2011 8:32 am Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Windows
Hi

I/P

COL1 COL2
123 1066|1084|2849|2849.......
456 1|2|ADF|....

O/P desired

COL1 COL2
123 1066
123 1084
123 2849
123 2849
456 1
456 2
456 ADF

I was trying to do this in TR with feild function and stage variable i cant get it , Can you guys please help

Thanks in advance
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 36583
Location: Denver, CO
Points: 186480

Post Posted: Tue Dec 06, 2011 10:00 am Reply with quote    Back to top    

Please show us what you tried and what isn't working about it. Also let us know if you have 8.5 or higher as that means there is a 'looping' solution for this.

_________________
-craig

It's a scheme of devices to get at low prices all goods from cough mixtures to cables
Which tickled the sailors by treating retailers as though they were all veg-e-tables
Rate this response:  
Not yet rated
kpushkaraj
Participant



Joined: 20 Feb 2011
Posts: 4

Points: 35

Post Posted: Tue Dec 06, 2011 10:14 am Reply with quote    Back to top    

I am using the 8.1

SV1 = COL1

SV2 =If COl1 <> SV1 Then Field(DSLink14.BKEY,'|', ) Else SV2

something like this , but i dont think this is the reight approach .Can you please guide me to right way
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 36583
Location: Denver, CO
Points: 186480

Post Posted: Tue Dec 06, 2011 11:09 am Reply with quote    Back to top    

It's not until you get to 8.5 that you can create multiple output records from one input record, which is the 'looping' that I mentioned. Before that, what you need to do is turn your one big delimited column into separate columns and you can use the Field() function for that or the Column Import stage, I do believe. Once you have all of the values into individual columns you can use the Pivot stage to do a horizontal (columns to rows) pivot to create multiple rows per 'key' value.

You will need to know the maximum possible values in your delimited string and code for that, then remove any 'empty' rows post pivot.

_________________
-craig

It's a scheme of devices to get at low prices all goods from cough mixtures to cables
Which tickled the sailors by treating retailers as though they were all veg-e-tables
Rate this response:  
Not yet rated
kpushkaraj
Participant



Joined: 20 Feb 2011
Posts: 4

Points: 35

Post Posted: Tue Dec 06, 2011 12:02 pm Reply with quote    Back to top    

I dont know what my COL2 data length will be ?

It may be from 123|345fs|are45| 2423| ....n It's not fixed no.of columns.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 36583
Location: Denver, CO
Points: 186480

Post Posted: Tue Dec 06, 2011 2:24 pm Reply with quote    Back to top    

I understand but you need to pick a maximum and work with that. Profile the data you are receiving and see what the max occurance is and then use something higher to be safe. That or upgrade to 8.5 where all those shenanigans become unnecessary.

_________________
-craig

It's a scheme of devices to get at low prices all goods from cough mixtures to cables
Which tickled the sailors by treating retailers as though they were all veg-e-tables
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 49931
Location: Canberra, Australia
Points: 271372

Post Posted: Tue Dec 06, 2011 3:04 pm Reply with quote    Back to top    

There is a simple solution using a server job, in which you convert the "|" characters to value marks then write to a hashed file. Then you read back from the hashed file normalized on the multi-valu ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong
Delivering Outcomes Together
Rate this response:  
Not yet rated
marchana
Participant



Joined: 13 Jul 2010
Posts: 18

Points: 144

Post Posted: Mon Dec 19, 2011 11:26 pm Reply with quote    Back to top    

Here is the way i did,

From the database itself, i took one more column which contains the number generated by row_number () function limited to the length of the column data.

id, data, no
1, ad|fh|ju, 1
1, ad|fh|ju, 2
1, ad|fh|ju, 3...........

then i took count of the data column with respect to the delimiter and i put condition in the transformer no <= count .Finally i used field function to get the data separately since we know that single coulmn has to be loaded with the value.

Hope this will helpful for you.
Rate this response:  
Not yet rated
Kirtikumar
Participant



Joined: 15 Oct 2004
Posts: 435
Location: Pune, India
Points: 3587

Post Posted: Tue Dec 20, 2011 2:58 am Reply with quote    Back to top    

Assuming this is coming from a sequential file and " " (space) is delimiter between Id and the | delimited string, we can use the following script.

Code:
cat FileName.txt | awk -F "|" ' { FirstCol=substr($1,1,index($1, " ")-1); print $1; for(i=2;i<=NF;i++) printf("%s %s\n",FirstCol,$i); } '

Even if the input is from other source like DB, Dataset etc, we can use the one of the DS stage which allows to run Unix command on standard input. I think it is - External filter stage.

_________________
Regards,
S. Kirtikumar.
Rate this response:  
Not yet rated
vamsi.4a6
Participant



Joined: 22 Jan 2012
Posts: 233

Points: 1998

Post Posted: Fri May 11, 2012 9:25 pm Reply with quote    Back to top    

Kirtikumar wrote:
Even if the input is from other source like DB, Dataset etc, we can use the one of the DS stage which allows to run Unix command on standard input. I think it is - External filter stage.


I hope we can not use above unix command in External filter stage.Please correct me if i am wrong?
Rate this response:  
Not yet rated
boggavarapu
Participant



Joined: 27 Jul 2009
Posts: 15

Points: 104

Post Posted: Sat May 12, 2012 9:39 am Reply with quote    Back to top    

I tried this and got the solution.

I first used field function to split the data in col2 to different columns.

then i used pivot stage to get the desired out put..

SEQ>>TRF>>PVT>>SEQ

TRF:
Field(DSLink3.col2,',',1)
Field(DSLink3.col2,',',2)
Field(DSLink3.col2,',',3)
PVT:
col2:= col2,col3,col4

_________________
srini
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 49931
Location: Canberra, Australia
Points: 271372

Post Posted: Sat May 12, 2012 4:49 pm Reply with quote    Back to top    

How do you handle the arbitratry number of values?

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong
Delivering Outcomes Together
Rate this response:  
Not yet rated
rameshrr3



Group memberships:
Premium Members

Joined: 10 May 2004
Posts: 519
Location: BRENTWOOD, TN
Points: 5676

Post Posted: Tue May 15, 2012 11:14 am Reply with quote    Back to top    

I know the variable pivot was discussed in an earlier post .
It can be done perfectly if one knows the Maximum number of delimited fields possible in this file . ( It can also be gotten as teh output of a shell script or prior job that counts the delimiters for the "Sub record " field Smile

Then use a row generator that generates an integer cycle ( which cycles from 1 to Max Delim count ) for each row, combine it with the input data using a key col with data = "1" , and for each row use field function with delim instance specified by current sequence value from row generator , apply constraints in transformer such that rowgen seq value <= max delim count from delimited field) ,it should work. For server jobs use approach Ray suggested ( havent tried the server job approach myself, but the parallel job approach has worked well for me )
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours