Unique record counter-Hash partitioning before transformer

Maximus_Jack · Post by **Maximus_Jack** » Tue Apr 15, 2014 11:33 am

Hi
Before posting this, i tried myself and searched a lot but couldnt find a solution.

my scenario is,
Using an SQL i'm retriving around 80 columns and passing it to a transformer, in the transformer i need to assign a number in incremental order for every unique record that comes in based on a particular column.

but the catch is, i'm doing a hash paritioning in the input link of the transformer, so the number of rows that comes to each partition may not be the same, in that case the usual logic using numpartitions, @partitionnum is not working...

any help on this is appreciated

thanks
MJ

zulfi123786 · Post by **zulfi123786** » Tue Apr 15, 2014 12:17 pm

I am not sure If I understood it correctly, you need to count the number of unique records base on a column for the entire input data ?

If so, in the transformer as you are doing hash partitioning, have a stage variable to count the unique records per partition pass it to output and finally have a aggregator to find max of this value then reaggregare in sequential mode to find the sum of max values.

Maximus_Jack · Post by **Maximus_Jack** » Tue Apr 15, 2014 1:21 pm

sorry if my explanation was not clear, now i have edited my post, hope it helps..

a number has to be generated for every unique value that comes in based on a particular, this is one of the requirement i need to handle, i dont want to create a seperate job for doing this functionality alone.. because the other columns has to go through a lot of transformation in the transformer, so i'm striving to get this done as part of this job itself.. is there any way?

thanks
MJ

zulfi123786 · Post by **zulfi123786** » Tue Apr 15, 2014 1:44 pm

If you dont want to settle with holes in the generated column then you can either run the job on sequential mode (which you wouldn't as if it was then this thread would not have existed :D ) or use NextSurrogateKey() with a block size of 1 which again kind of serializes your processing.

Maximus_Jack · Post by **Maximus_Jack** » Tue Apr 15, 2014 4:48 pm

Hi all...

any help on this .... thanks MJ

Maximus_Jack · Post by **Maximus_Jack** » Thu Apr 17, 2014 9:15 pm

Hi All... is there any solution for this ???

thanks
MJ

bharathappriyan · Post by **bharathappriyan** » Fri Apr 18, 2014 12:55 pm

Hi,
If i am not wrong, your reuqirement is to assign a number to unique value for a particular column. But it doesn't need to be in order.
if that is the case, when read the data from table create a column with rownum. copy the data in two links. In One link copy the rownum & unique value column and use remove duplicate based on the unique value. then join the data with other link. Or you can do that in the query itself using row_over() function.

Thanks,
Bharathappriyan

prasson_ibm · Post by **prasson_ibm** » Tue Apr 22, 2014 4:14 am

Hi,

My advice is define a stage veriable in Transformer stage.Hash partition the input column and compare the current record with the previous record,if both are different then increment the counter veriable otherwise let it be same.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 22, 2014 2:32 pm

I'd use the ROWNUM or equivalent function in the SQL.