Unique record counter-Hash partitioning before transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Unique record counter-Hash partitioning before transformer

Post by Maximus_Jack »

Hi
Before posting this, i tried myself and searched a lot but couldnt find a solution.

my scenario is,
Using an SQL i'm retriving around 80 columns and passing it to a transformer, in the transformer i need to assign a number in incremental order for every unique record that comes in based on a particular column.

but the catch is, i'm doing a hash paritioning in the input link of the transformer, so the number of rows that comes to each partition may not be the same, in that case the usual logic using numpartitions, @partitionnum is not working...

any help on this is appreciated

thanks
MJ
Last edited by Maximus_Jack on Tue Apr 15, 2014 1:18 pm, edited 1 time in total.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

I am not sure If I understood it correctly, you need to count the number of unique records base on a column for the entire input data ?

If so, in the transformer as you are doing hash partitioning, have a stage variable to count the unique records per partition pass it to output and finally have a aggregator to find max of this value then reaggregare in sequential mode to find the sum of max values.
- Zulfi
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

sorry if my explanation was not clear, now i have edited my post, hope it helps..

a number has to be generated for every unique value that comes in based on a particular, this is one of the requirement i need to handle, i dont want to create a seperate job for doing this functionality alone.. because the other columns has to go through a lot of transformation in the transformer, so i'm striving to get this done as part of this job itself.. is there any way?

thanks
MJ
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

If you dont want to settle with holes in the generated column then you can either run the job on sequential mode (which you wouldn't as if it was then this thread would not have existed :D ) or use NextSurrogateKey() with a block size of 1 which again kind of serializes your processing.
- Zulfi
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi all...

any help on this .... thanks MJ
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi All... is there any solution for this ???

thanks
MJ
bharathappriyan
Participant
Posts: 47
Joined: Fri Sep 23, 2005 6:01 pm

Post by bharathappriyan »

Hi,
If i am not wrong, your reuqirement is to assign a number to unique value for a particular column. But it doesn't need to be in order.
if that is the case, when read the data from table create a column with rownum. copy the data in two links. In One link copy the rownum & unique value column and use remove duplicate based on the unique value. then join the data with other link. Or you can do that in the query itself using row_over() function.

Thanks,
Bharathappriyan
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,

My advice is define a stage veriable in Transformer stage.Hash partition the input column and compare the current record with the previous record,if both are different then increment the counter veriable otherwise let it be same.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd use the ROWNUM or equivalent function in the SQL.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply