stage variables
Moderators: chulett, rschirm, roy
stage variables
Hi can someone help me in defining stage variables for a simple logic.
I am having the input as:
No Pos
1 A
1 A
1 A
1 A
2 B
2 C
3 D
3 D
3 D
4 C
And i need o/p to be
No Pos
1 A
1 A
1 A
1 A
3 D
3 D
3 D
4 C
Reject
2 B
2 C
I am pretty new to datastage.Anyhelp is appreciated.
Thanks
siri
I am having the input as:
No Pos
1 A
1 A
1 A
1 A
2 B
2 C
3 D
3 D
3 D
4 C
And i need o/p to be
No Pos
1 A
1 A
1 A
1 A
3 D
3 D
3 D
4 C
Reject
2 B
2 C
I am pretty new to datastage.Anyhelp is appreciated.
Thanks
siri
Re: stage variables
Please explain your logic. What is the basis for rejecting those 2 records?
Re: stage variables
I am rejecting those two rows because the POS is not same for that particular ID.deployDS wrote:Please explain your logic. What is the basis for rejecting those 2 records?
For the given ID the positions should be same orelse reject that complete set.
Thanks
i have another solution for this problem.i will not use stage variables though.
Since u said ur input will have same IDs and Pos and u need to filter out the odd ones, i suggest you use remove duplicate stage with keys id and pos to identify all the records that you want to keep. Collect them into a dataset.
Now take ur original input and do a left join and check for records which are in the input and not in ur desired dataset. that eliminates the odd records from ur file and u can have a final output with the desired data.
hth
Since u said ur input will have same IDs and Pos and u need to filter out the odd ones, i suggest you use remove duplicate stage with keys id and pos to identify all the records that you want to keep. Collect them into a dataset.
Now take ur original input and do a left join and check for records which are in the input and not in ur desired dataset. that eliminates the odd records from ur file and u can have a final output with the desired data.
hth
Can you try this.
use an aggregator on id to filter out the single occurances.since they will be part of the final out put.
this identifies records like 4,C
now filter out these unique occurances from ur input and then use aggregator again on id and pos and to get records with count=1.this identifies the records like
2,B
2,C
Now for the final output concat 4,C records and records with ID,POS count > 1.
that shud give the final output.
use an aggregator on id to filter out the single occurances.since they will be part of the final out put.
this identifies records like 4,C
now filter out these unique occurances from ur input and then use aggregator again on id and pos and to get records with count=1.this identifies the records like
2,B
2,C
Now for the final output concat 4,C records and records with ID,POS count > 1.
that shud give the final output.
pavankvk wrote:i tested this and its working..it may .
Hi Pavan,
I am having like 20000 rows in the input and like 30 columns.The method you had shown me seems to be little bit complicated thats y i preferred asking for stage variables.Anyway can u just give me some more idea like while aggregating what should be my key and how to filter the data.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
An approach using stage variables may be found here
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Here is ur file
ID,POS
1,A
1,A
1,A
1,A
2,B
2,C
3,D
3,D
3,D
4,C
Now to identify single occurence records like 4,C use an aggregator stage.
In the stage specify the
group as ID
Aggregation type: count rows
give some name to output column..say ID_Count
Have a transformer after the aggregator and check for ID_count=1 and ID_Count > 1 in the constraint. u get 2 links as output from transformer.
Take the link with id_count=1 and do a left outer with the input. Left being input stream.You can use a copy stage to get ur input stream again for join.
After the left outer join,u can check for null value in the non key columns of the right stream and then output the non-null stream to a funnel.----> 1
take the null stream to a aggregator again and this time, ur keys are ID,POS.
now do the same as u did earlier..check for count=1 and count>1
This time count =1 will be 2,b and 2,c.
Inner Join the count>1 with the input to get all the records ----> 2
Funnel 1 and 2 to get the final output and collect the rejects in another file.
ID,POS
1,A
1,A
1,A
1,A
2,B
2,C
3,D
3,D
3,D
4,C
Now to identify single occurence records like 4,C use an aggregator stage.
In the stage specify the
group as ID
Aggregation type: count rows
give some name to output column..say ID_Count
Have a transformer after the aggregator and check for ID_count=1 and ID_Count > 1 in the constraint. u get 2 links as output from transformer.
Take the link with id_count=1 and do a left outer with the input. Left being input stream.You can use a copy stage to get ur input stream again for join.
After the left outer join,u can check for null value in the non key columns of the right stream and then output the non-null stream to a funnel.----> 1
take the null stream to a aggregator again and this time, ur keys are ID,POS.
now do the same as u did earlier..check for count=1 and count>1
This time count =1 will be 2,b and 2,c.
Inner Join the count>1 with the input to get all the records ----> 2
Funnel 1 and 2 to get the final output and collect the rejects in another file.
Thanks pavanpavankvk wrote:ok,i can explain step by step
I am having 30 columns.Out of these,i am mainly concerned with 2 columns,but in output i should have all these columns.
ID is the main key column.
POS is the column on which my transformation depends.
I am having multiples rows for the ID like he use to hold different POS in the company.
I need to check that for a particular ID there should be only one POS allotted.
if any ID is assigned with more than one POS then i had to capture these rows in the Reject file.
Here for the ID=1 the POS in all the rows is A thats y it is there in the output.Whereas for ID=2 the POS is B and C which are different so i had to reject this whole set.
I think this is clear.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
For less than $1 per week (which goes towards paying for this site's bandwidth) you can.ds1user1 wrote:I am unable to see the replies.ray.wurlod wrote:An approach using stage variables may be found here
Thanks
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.