Page 1 of 1

Group by clause implementation in Datastage

Posted: Fri Feb 03, 2017 3:15 am
by rumu
Dear Members,

I have a requirement as below:

Select first row
From TEXT
Where TEXT.OID = 2498
And TEXT.TEXT in ('Discharge', 'Expired')
Group by TEXT.OID
Order by TEXT.STARTDTIME Asc

The source TEXT is Flat file.There are mutiple STARTDTIME for same OID and we have to pick up the least one .
We designed as below:
SEQFile Stage---->RDC-----Xfrm-----Target dataset

In RDC,we used OID as key and Partitioning tab
did HashPartitioning on OID and Sort on STARTDTIME.Not used Sort on OID.

We received proper output.Wanted to check with you if this approach is fine or we need to sort OID also along with Partitioning.

Thanks,
Rumu

Posted: Fri Feb 03, 2017 8:23 am
by chulett
So... RDC is the "Remove Duplicates" stage, yes? :?

Yes, one would need to sort on all of the "group by" fields, in this case OID. I would wager if you did more extensive testing on larger data sets you would no longer receive 'proper output'.

Posted: Fri Feb 03, 2017 9:03 am
by rumu
Thanks Chulett,noted this point.

Posted: Wed Feb 08, 2017 6:52 pm
by rameshrr3
Only one group - since TEXT.OID is filtered on single value ? Whats the point - curious

Posted: Thu Feb 09, 2017 8:13 am
by chulett
What's the point of what?

Posted: Thu Feb 09, 2017 11:41 am
by rameshrr3
From TEXT
Where TEXT.OID = 2498
And TEXT.TEXT in ('Discharge', 'Expired')
Group by TEXT.OID

If they were using SQL , a group by would not be needed

Posted: Thu Feb 09, 2017 12:16 pm
by chulett
When people post examples they rarely fully match reality (if at all) so I'll typically answer with general / generic advice. Did much the same here so they can apply it to whatever their specific situation is, as needed.