Capture Multiple changes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavans
Participant
Posts: 116
Joined: Sun Sep 10, 2006 7:33 am

Capture Multiple changes

Post by pavans »

Hi

This is more like fixing the existing data.

Source: This is the data I am getting from source.

rno, bk, active_frm, active_to ,flag,org ,unit
1, 28023 29/08/2003 05/12/2003 N 25598 25598
2, 28023 06/12/2003 10/12/2003 N 25598 25598
3, 28023 11/12/2003 03/03/2004 N 1697959 1697959
4, 28023 04/03/2004 08/03/2004 N 1697959 1697959
5, 28023 09/03/2004 11/03/2004 N 1697959 1697959
6,28023 12/03/2004 29/12/2004 N 1697959 1697959
7, 28023 30/12/2004 07/03/2007 N 26427 5683717
8, 28023 08/03/2007 02/04/2007 N 26427 26427
9, 28023 03/04/2007 18/04/2007 N 26427 5683717
10, 28023 19/04/2007 04/12/2009 N 26427 26427
11, 28023 05/12/2009 31/12/2999 Y -3 -3

Rows 1, 2 have no changes for org and unit columns so I need to pick the latest active_to and close the first record.

ie., sk, bk, active_frm , active_to , flag, org, unit
100 28023 29/08/2003 10/12/2003 N 25598 25598

Rows 3, 4,5,6 have no changes for org and unit columns so I need to pick the latest active_to and close the first record.

101 28023 11/12/2003 29/12/2004 N 1697959 1697959


The rows 7,8,9,10,11 are correct. So I just have to insert them as they are.

My target should look like:

sk, bk, active_frm , active_to , flag, org, unit
100 28023 29/08/2003 10/12/2003 N 25598 25598
101 28023 11/12/2003 29/12/2004 N 1697959 1697959
102 28023 30/12/2004 07/03/2007 N 26427 5683717
103 28023 08/03/2007 02/04/2007 N 26427 26427
104 28023 03/04/2007 18/04/2007 N 26427 5683717
105 28023 19/04/2007 04/12/2009 N 26427 26427
106 28023 05/12/2009 31/12/2999 Y -3 -3

What I did:
1. Created checksum for bk, org , unit and used stage variables.
I was able to achieve if the group of same records are next to each other. But using this approach for rows 7,9 also get same checksum and spoils my approach.

Any advise is greatly appreciated.

Thanks in advance.
Thanks,
Pavan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am

Post by ArndW »

You are doing a variant on data historisation here. The checksum on your change-data columns just needs to compare to the previous record in the same group and you are set.

If your input is sorted on the key (column "bk") and by the effective date (column "active_frm") then you would create your checksum column from values in columns "flag", "org" and "unit" and compare it with the checksum of the previous record only.

One approach to doing this is using a transform stage to keep the last row and do comparisons using stage variables. Compare the current row's hash to that of the previous row to detect value changes.

Another option would be use the Historisation Operator but that is a separate (non-free) product and is certainly overkill for solving just this task.
<a href=http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=TZ9H4CGVP1 target="WCGWin">
Image</a>
rschirm
Premium Member
Premium Member
Posts: 27
Joined: Fri Dec 13, 2002 2:53 pm

Post by rschirm »

You might address it this way.

1. Read data in and sort by bk,org,unit,flag(maybe)
2. feed to Aggregator stage grouping by bk,org,unit,flag(maybe)
3. output grouping values of bk,org,unit,flag(maybe),Min(active_frm),Max(active_to)
4. add seregate key and write to output.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am

Post by priyadarshikunal »

I would prefer the row by row comparison using stage variable as already mentioned by Arnd.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply