Unique record counter-Hash partitioning before transformer
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 139
- Joined: Fri Apr 11, 2008 1:02 pm
Unique record counter-Hash partitioning before transformer
Hi
Before posting this, i tried myself and searched a lot but couldnt find a solution.
my scenario is,
Using an SQL i'm retriving around 80 columns and passing it to a transformer, in the transformer i need to assign a number in incremental order for every unique record that comes in based on a particular column.
but the catch is, i'm doing a hash paritioning in the input link of the transformer, so the number of rows that comes to each partition may not be the same, in that case the usual logic using numpartitions, @partitionnum is not working...
any help on this is appreciated
thanks
MJ
Before posting this, i tried myself and searched a lot but couldnt find a solution.
my scenario is,
Using an SQL i'm retriving around 80 columns and passing it to a transformer, in the transformer i need to assign a number in incremental order for every unique record that comes in based on a particular column.
but the catch is, i'm doing a hash paritioning in the input link of the transformer, so the number of rows that comes to each partition may not be the same, in that case the usual logic using numpartitions, @partitionnum is not working...
any help on this is appreciated
thanks
MJ
Last edited by Maximus_Jack on Tue Apr 15, 2014 1:18 pm, edited 1 time in total.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
I am not sure If I understood it correctly, you need to count the number of unique records base on a column for the entire input data ?
If so, in the transformer as you are doing hash partitioning, have a stage variable to count the unique records per partition pass it to output and finally have a aggregator to find max of this value then reaggregare in sequential mode to find the sum of max values.
If so, in the transformer as you are doing hash partitioning, have a stage variable to count the unique records per partition pass it to output and finally have a aggregator to find max of this value then reaggregare in sequential mode to find the sum of max values.
- Zulfi
-
- Premium Member
- Posts: 139
- Joined: Fri Apr 11, 2008 1:02 pm
sorry if my explanation was not clear, now i have edited my post, hope it helps..
a number has to be generated for every unique value that comes in based on a particular, this is one of the requirement i need to handle, i dont want to create a seperate job for doing this functionality alone.. because the other columns has to go through a lot of transformation in the transformer, so i'm striving to get this done as part of this job itself.. is there any way?
thanks
MJ
a number has to be generated for every unique value that comes in based on a particular, this is one of the requirement i need to handle, i dont want to create a seperate job for doing this functionality alone.. because the other columns has to go through a lot of transformation in the transformer, so i'm striving to get this done as part of this job itself.. is there any way?
thanks
MJ
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
-
- Premium Member
- Posts: 139
- Joined: Fri Apr 11, 2008 1:02 pm
-
- Premium Member
- Posts: 139
- Joined: Fri Apr 11, 2008 1:02 pm
-
- Participant
- Posts: 47
- Joined: Fri Sep 23, 2005 6:01 pm
Hi,
If i am not wrong, your reuqirement is to assign a number to unique value for a particular column. But it doesn't need to be in order.
if that is the case, when read the data from table create a column with rownum. copy the data in two links. In One link copy the rownum & unique value column and use remove duplicate based on the unique value. then join the data with other link. Or you can do that in the query itself using row_over() function.
Thanks,
Bharathappriyan
If i am not wrong, your reuqirement is to assign a number to unique value for a particular column. But it doesn't need to be in order.
if that is the case, when read the data from table create a column with rownum. copy the data in two links. In One link copy the rownum & unique value column and use remove duplicate based on the unique value. then join the data with other link. Or you can do that in the query itself using row_over() function.
Thanks,
Bharathappriyan
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi,
My advice is define a stage veriable in Transformer stage.Hash partition the input column and compare the current record with the previous record,if both are different then increment the counter veriable otherwise let it be same.
My advice is define a stage veriable in Transformer stage.Hash partition the input column and compare the current record with the previous record,if both are different then increment the counter veriable otherwise let it be same.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: