Find the highest value in a group

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tbtcust
Premium Member
Premium Member
Posts: 228
Joined: Tue Mar 04, 2008 9:07 am

Find the highest value in a group

Post by tbtcust »

Hello all,

I have a set of records that I am sorting and creating a change key. I want to select the record that has the highest value in a different column (Col_2 below). See example below. Is there a way to select that record in the group without having to resort? Thanks in advance for any help.

Input after the sort and key change:

keyChange, Col_1, Col_2, Col_3
=-=-=-=--=-=-=-
1, AAAA, 1, Data
0, AAAA, 3, Data
0, AAAA, 2, Data
1, BBBB, 1, Data
0, BBBB, 2, Data

Desired output:

Col_1, Col_2
=-=-=-=--=-=-=-
AAAA, 3, Data
BBBB, 2, Data
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You don't really need to Sort or leverage a Key Change column, just run the data through an Aggregator. Group by Col_1, max(Col_2) and do whatever is appropriate for Col_3... depending on what "Data" means.
-craig

"You can never have too many knives" -- Logan Nine Fingers
tbtcust
Premium Member
Premium Member
Posts: 228
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Thank you for the reply chulett.

Col_3 is not available on the mapping tab. Is this normal? Is there a way to map it in the Aggregator? Col_3 is different across the records in the group.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You'd have to clarify what "Col_3 is different across the records in the group" means. Can you supply examples with real data, input and output?
-craig

"You can never have too many knives" -- Logan Nine Fingers
tbtcust
Premium Member
Premium Member
Posts: 228
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

thanks chulett

Empl_rec, Daily_Change_Ver_Num, Changed Data
03456, 1, Empl_Status
03456, 3, Empl_Status, Leave_of_Absone_Date, New_Salary
03456, 2, Empl_Status , Leave_of_Absence_Date
45633, 1, New_Address
45633, 2, New_Address, Empl_Status


There may be multiple changes to a record between runes. Daily_Change_Ver_Num represents which one happed first, second, etc. The last version has all the changes from the previous versions.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So your desired output would be...

Empl_rec, Daily_Change_Ver_Num, Changed Data
03456, 3, "Empl_Status, Leave_of_Absone_Date, New_Salary"
45633, 2, "New_Address, Empl_Status"

Correct? And your changed data field, is it in fact one field containing a comma delimited list of the changes made? I specifically put quotes around the values assuming that was the case. If so, you should be able to have the Aggregator take the max() value from that column as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
tbtcust
Premium Member
Premium Member
Posts: 228
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Sorry. Yes that would be correct.
tbtcust
Premium Member
Premium Member
Posts: 228
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Hello chulett.

I have abandoned the Aggregator and Group by solution and went back to the sort and key option, which seems to be working.

Thanks so much.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As Craig noted, you don't need to re-sort (unless you want to run the Aggregator stage in Sort mode). What IS important is that the data are partitioned using a key-based algorithm on the grouping column(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply