| Author |
Message |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
| DataStage® Release: 8x |
| Job Type: Parallel |
| OS: Unix |
|
I have this dataset of 2 columns - ID Number and Loading Date.
I need to consider the most recent Loading Date in case of duplicate IDs. However the MAX function in my aggregator stage insists on storing the result in a DOUBLE datatype whereas my source is a timestamp. Is there a workaround?
I have also tried to achieve the result by using a Remove Duplicates stage where I am first sorting my data on ID (ascending) , Loading Date (ascending) and then trying to retain the last record for each repetitive column, my key being ID. However it is not working here and it is retaining identical occurences of ID, albeit with different dates whereas I need only the most recent date for the same ID. Here's what I'm getting:
ID
LOADING_DATE
0000000000001542
15/11/2008 22:30:50
0000000000001542
05/01/2009 23:01:25
0000000000002459
30/07/2009 09:51:59
0000000000002459
04/06/2009 18:54:41
0000000000005494
14/10/2008 15:00:11
0000000000005494
30/07/2009 09:51:59
0000000000005690
04/06/2009 18:54:41
0000000000005690
30/07/2009 09:51:59
0000000000007668
30/07/2009 09:51:59
0000000000007668
12/01/2009 10:54:26
I can't seem to figure out where I'm going wrong. Any ideas people?
Thanks
S. BASU
|
_________________ Tony
BI Consultant - Datastage |
|
|
|
 |
chulett
 since January 2006
Group memberships: Premium Members, Inner Circle, Server to Parallel Transition Group
Joined: 12 Nov 2002
Posts: 36803
Location: Denver, CO
Points: 187577
|
|
|
|
|
|
|
BASU wrote:
However the MAX function in my aggregator stage insists on storing the result in a DOUBLE datatype whereas my source is a timestamp. Is there a workaround?
Enable the
Preserve ...
|
_________________ -craig
"I accidentally swallowed some Scrabble tiles, my next trip to the bathroom could spell disaster."
|
|
|
|
 |
ray.wurlod
Participant
Group memberships: Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group
Joined: 23 Oct 2002
Posts: 50173
Location: Canberra, Australia
Points: 272727
|
|
|
|
|
|
Partition your data by ID using a key-based partitioning algorithm (Hash).
Take some time to learn about partitioning.
|
_________________ RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong
Delivering Outcomes Together |
|
|
|
 |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
|
|
|
|
Works like a charm. Thanks Ray.
Craig - I am yet to verify your method ( I need to sign up for a Premium account to read your post in full which I shall do over this weekend) but I'm sure it will work. Cheers...
|
_________________ Tony
BI Consultant - Datastage |
|
|
|
 |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
|
|
 |
chulett
 since January 2006
Group memberships: Premium Members, Inner Circle, Server to Parallel Transition Group
Joined: 12 Nov 2002
Posts: 36803
Location: Denver, CO
Points: 187577
|
|
|
|
|
|
|
Unfortunately, I don't have any way to check this and provide more details, hopefully someone else can.
|
_________________ -craig
"I accidentally swallowed some Scrabble tiles, my next trip to the bathroom could spell disaster."
|
|
|
|
 |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
|
|
|
|
For further info:
In my Aggregator stage, Tab 'Stage' sub tab 'Properties' under Options I have only 2 possible heads - Allow Null Output (which I have set to false) and Method (which I have set to Sort since the number of groups I have is more than 1000)
There is no Preserve Type option here or under any other tab / sub tab in this stage.
|
_________________ Tony
BI Consultant - Datastage |
|
|
|
 |
PhilHibbs

Group memberships: Premium Members
Joined: 29 Sep 2004
Posts: 1041
Location: Nottingham, UK
Points: 12327
|
|
|
|
|
|
|
I'm pretty sure that Preserve Type doesn't apply to dates and timestamps. Max and Min aggregations only work on numeric data. For a Date, I think it uses the ICONV value of the date. Not sure what it uses for Timestamps. Maybe the fractional part of the Double is the proportion of a day that the time represents, which I think can be easily turned back into a Timestamp afterwards.
|
_________________ Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page
|
|
|
|
 |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
|
|
 |
TonyInFrance

Group memberships: Premium Members
Joined: 27 May 2008
Posts: 200
Location: Paris
Points: 2349
|
|
|
|
|
|
Got it guys. Don't bother anymore.
The property exists in the right hand window.
However even setting this property to true does not work.
I have exactly the same message as before:
Aggregator_65: Error when checking operator: When binding output interface field "MAX_DAT_CHARGT" to field "MAX_DAT_CHARGT": No default type conversion from type "timestamp" to type "dfloat".
|
_________________ Tony
BI Consultant - Datastage |
|
|
|
 |
chulett
 since January 2006
Group memberships: Premium Members, Inner Circle, Server to Parallel Transition Group
Joined: 12 Nov 2002
Posts: 36803
Location: Denver, CO
Points: 187577
|
|
|
|
|
|
Please stick to your own topic for discussion of this issue rather than fishing around for
related posts made by others.
No need to repeat what is here in other posts. Also, marking this post as having a 'workaround' is probably keeping people from visiting it again...
|
_________________ -craig
"I accidentally swallowed some Scrabble tiles, my next trip to the bathroom could spell disaster."
|
|
|
|
 |
|
|