Aggregator stage constraint thus use of Remove Duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Aggregator stage constraint thus use of Remove Duplicates

Post by TonyInFrance »

I have this dataset of 2 columns - ID Number and Loading Date.

I need to consider the most recent Loading Date in case of duplicate IDs. However the MAX function in my aggregator stage insists on storing the result in a DOUBLE datatype whereas my source is a timestamp. Is there a workaround?

I have also tried to achieve the result by using a Remove Duplicates stage where I am first sorting my data on ID (ascending) , Loading Date (ascending) and then trying to retain the last record for each repetitive column, my key being ID. However it is not working here and it is retaining identical occurences of ID, albeit with different dates whereas I need only the most recent date for the same ID. Here's what I'm getting:


ID LOADING_DATE

0000000000001542 15/11/2008 22:30:50
0000000000001542 05/01/2009 23:01:25
0000000000002459 30/07/2009 09:51:59
0000000000002459 04/06/2009 18:54:41
0000000000005494 14/10/2008 15:00:11
0000000000005494 30/07/2009 09:51:59
0000000000005690 04/06/2009 18:54:41
0000000000005690 30/07/2009 09:51:59
0000000000007668 30/07/2009 09:51:59
0000000000007668 12/01/2009 10:54:26

I can't seem to figure out where I'm going wrong. Any ideas people?

Thanks

S. BASU
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Aggregator stage constraint thus use of Remove Duplicate

Post by chulett »

BASU wrote:However the MAX function in my aggregator stage insists on storing the result in a DOUBLE datatype whereas my source is a timestamp. Is there a workaround?
Enable the Preserve Type property.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Partition your data by ID using a key-based partitioning algorithm (Hash).

Take some time to learn about partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Works like a charm. Thanks Ray.

Craig - I am yet to verify your method ( I need to sign up for a Premium account to read your post in full which I shall do over this weekend) but I'm sure it will work. Cheers...:-)
Tony
BI Consultant - Datastage
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Re: Aggregator stage constraint thus use of Remove Duplicate

Post by TonyInFrance »

chulett wrote:Enable the Preserve Type property.
Craig - Where do I do this? In the aggregator stage I have Preserve Partitioning. Is this what you mean by Preserve Type property?

Regards

S. BASU

Edit:
I read through this thread:
viewtopic.php?t=135260&start=0&postdays ... e+property

The problem is I can't find the subproperty preserve type.

Anything I have missing in my installation?
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unfortunately, I don't have any way to check this and provide more details, hopefully someone else can.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

For further info:
In my Aggregator stage, Tab 'Stage' sub tab 'Properties' under Options I have only 2 possible heads - Allow Null Output (which I have set to false) and Method (which I have set to Sort since the number of groups I have is more than 1000)
There is no Preserve Type option here or under any other tab / sub tab in this stage.
Tony
BI Consultant - Datastage
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I'm pretty sure that Preserve Type doesn't apply to dates and timestamps. Max and Min aggregations only work on numeric data. For a Date, I think it uses the ICONV value of the date. Not sure what it uses for Timestamps. Maybe the fractional part of the Double is the proportion of a day that the time represents, which I think can be easily turned back into a Timestamp afterwards.
Phil Hibbs | Capgemini
Technical Consultant
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Phil

Thanks for the heads up. However what I read in this thread:
viewtopic.php?t=135260&start=0&postdays ... e+property

is quite the opposite. They claim here that this property is the one to use in order to solve this problem. Apparently it is possible to see this property (Preserve Type) somewhere in the Aggregator stage.
Tony
BI Consultant - Datastage
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Got it guys. Don't bother anymore.
The property exists in the right hand window.

However even setting this property to true does not work.
I have exactly the same message as before:

Aggregator_65: Error when checking operator: When binding output interface field "MAX_DAT_CHARGT" to field "MAX_DAT_CHARGT": No default type conversion from type "timestamp" to type "dfloat".
Tony
BI Consultant - Datastage
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Please stick to your own topic for discussion of this issue rather than fishing around for related posts made by others.

No need to repeat what is here in other posts. Also, marking this post as having a 'workaround' is probably keeping people from visiting it again...
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply