how to check the set of 50 rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravikiran2712
Participant
Posts: 38
Joined: Thu Nov 04, 2004 10:36 am

how to check the set of 50 rows

Post by ravikiran2712 »

happy thanksgiving guys,
in one of my jobs iam getting 50 rows for each store/date combination into a transformer.i will have to check the 50 rows for one condition and i can output the 50 rows only if 80% of the 50 rows satisfy that condition.the problem over here is with a single transformer stage i cannot do it as i can decide whether i can output the 50 rows after i read the 50 rows so i will have to buffer the rows instead of writing it to the output. i need suggestions on this regarding the usage of the trasnformer stage for this purpose. if anyone can give a better idea i can change the logic.
thank you,
ravi
xcb
Premium Member
Premium Member
Posts: 66
Joined: Wed Mar 05, 2003 6:03 pm
Location: Brisbane, Australia
Contact:

Post by xcb »

Hi Ravi,

I would suggest you do this in a 2 part process. The first part takes your input data passes it through a transform and gives a rank\weight to each record based on your condition. Then load this rank along with your keys (store\date) into a UV table.

The second part again takes your input data into a transformer which has a lookup to the pre-loaded UV table. Use a group by on the UV table with a sum on the rank to identify for each store\date combination the total rank (this would most likely be a sum aggregation). Within the transformer match your input data to the lookup data and place a constraint on the output side of the transformer that only passes data through if the total rank from the lookup is >= 80%.

If you have a large volume of data you may find the lookup is slow, so it would be worth while aggregating the data before you load the lookup.

Sorry if this isn't what you were after, at least it should give you another approach to how you are doing things.
Cameron Boog
xcb
Premium Member
Premium Member
Posts: 66
Joined: Wed Mar 05, 2003 6:03 pm
Location: Brisbane, Australia
Contact:

Post by xcb »

Sorry - I just noticed that this is for px. I don't know if my solution will work.
Cameron Boog
ravikiran2712
Participant
Posts: 38
Joined: Thu Nov 04, 2004 10:36 am

can we do it without storing

Post by ravikiran2712 »

hi thanx for our reply,
can we do the job without storing it in a database as it takes lot of time.
xcb
Premium Member
Premium Member
Posts: 66
Joined: Wed Mar 05, 2003 6:03 pm
Location: Brisbane, Australia
Contact:

Post by xcb »

I don't think so, at least not doing it the way that I have suggested. I've never done any parallel work so there may be a better and more elegant solution out there for that architecture.
Cameron Boog
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Input -> Transformer -> Sort -> Transformer -> Output.

First transformer = use a counter method (search for it) that handles your conditions. Pass the counter output appended to each record.

Sort the data so that the maximum value of the counter are ranked first.

Second transformer would handle the logic of "If this group of key have a value above this, trigger this behavior (output record to output 1 using constraint)."

Good luck.
Post Reply