stage variables

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

stage variables

Post by ds1user1 »

Hi can someone help me in defining stage variables for a simple logic.

I am having the input as:

No Pos
1 A
1 A
1 A
1 A
2 B
2 C
3 D
3 D
3 D
4 C

And i need o/p to be

No Pos
1 A
1 A
1 A
1 A
3 D
3 D
3 D
4 C

Reject
2 B
2 C

I am pretty new to datastage.Anyhelp is appreciated.

Thanks
siri
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Re: stage variables

Post by deployDS »

Please explain your logic. What is the basis for rejecting those 2 records?
ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

Re: stage variables

Post by ds1user1 »

deployDS wrote:Please explain your logic. What is the basis for rejecting those 2 records?
I am rejecting those two rows because the POS is not same for that particular ID.

For the given ID the positions should be same orelse reject that complete set.

Thanks
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

i have another solution for this problem.i will not use stage variables though.

Since u said ur input will have same IDs and Pos and u need to filter out the odd ones, i suggest you use remove duplicate stage with keys id and pos to identify all the records that you want to keep. Collect them into a dataset.

Now take ur original input and do a left join and check for records which are in the input and not in ur desired dataset. that eliminates the odd records from ur file and u can have a final output with the desired data.

hth
ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

Post by ds1user1 »

pavankvk wrote:i have another solution for this problem.i will not use stage variables though.

hth
Its not working that way pavan.

Thanks
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

ds1user1 wrote:
pavankvk wrote:i have another solution for this problem.i will not use stage variables though.

hth
Its not working that way pavan.

Thanks
i am sorry..i will give another logic..even i realised it just now..thats not duplicate stage,i need to use aggregator
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

Can you try this.

use an aggregator on id to filter out the single occurances.since they will be part of the final out put.

this identifies records like 4,C

now filter out these unique occurances from ur input and then use aggregator again on id and pos and to get records with count=1.this identifies the records like

2,B
2,C

Now for the final output concat 4,C records and records with ID,POS count > 1.

that shud give the final output.
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

i tested this and its working..it may not be a elagant solution..but i go with it because i dont like to use stage variables.
ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

Post by ds1user1 »

pavankvk wrote:i tested this and its working..it may .

Hi Pavan,

I am having like 20000 rows in the input and like 30 columns.The method you had shown me seems to be little bit complicated thats y i preferred asking for stage variables.Anyway can u just give me some more idea like while aggregating what should be my key and how to filter the data.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An approach using stage variables may be found here
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

Post by ds1user1 »

ray.wurlod wrote:An approach using stage variables may be found here
I am unable to see the replies.

Thanks
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

ok,

give me the details like ur key columns..better give the lay out and tell me what u r trying to do..

its not complicated..if u can give details,i can explain step by step
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

Here is ur file

ID,POS
1,A
1,A
1,A
1,A
2,B
2,C
3,D
3,D
3,D
4,C

Now to identify single occurence records like 4,C use an aggregator stage.
In the stage specify the
group as ID
Aggregation type: count rows

give some name to output column..say ID_Count

Have a transformer after the aggregator and check for ID_count=1 and ID_Count > 1 in the constraint. u get 2 links as output from transformer.

Take the link with id_count=1 and do a left outer with the input. Left being input stream.You can use a copy stage to get ur input stream again for join.
After the left outer join,u can check for null value in the non key columns of the right stream and then output the non-null stream to a funnel.----> 1

take the null stream to a aggregator again and this time, ur keys are ID,POS.
now do the same as u did earlier..check for count=1 and count>1

This time count =1 will be 2,b and 2,c.

Inner Join the count>1 with the input to get all the records ----> 2

Funnel 1 and 2 to get the final output and collect the rejects in another file.
ds1user1
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:01 am

Post by ds1user1 »

pavankvk wrote:ok,i can explain step by step
Thanks pavan

I am having 30 columns.Out of these,i am mainly concerned with 2 columns,but in output i should have all these columns.

ID is the main key column.

POS is the column on which my transformation depends.

I am having multiples rows for the ID like he use to hold different POS in the company.

I need to check that for a particular ID there should be only one POS allotted.

if any ID is assigned with more than one POS then i had to capture these rows in the Reject file.

Here for the ID=1 the POS in all the rows is A thats y it is there in the output.Whereas for ID=2 the POS is B and C which are different so i had to reject this whole set.

I think this is clear.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

ds1user1 wrote:
ray.wurlod wrote:An approach using stage variables may be found here
I am unable to see the replies.

Thanks
For less than $1 per week (which goes towards paying for this site's bandwidth) you can.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply