Capture Multiple changes

pavans · Post by **pavans** » Wed Dec 17, 2014 8:52 am

Hi

This is more like fixing the existing data.

Source: This is the data I am getting from source.

rno, bk, active_frm, active_to ,flag,org ,unit
1, 28023 29/08/2003 05/12/2003 N 25598 25598
2, 28023 06/12/2003 10/12/2003 N 25598 25598
3, 28023 11/12/2003 03/03/2004 N 1697959 1697959
4, 28023 04/03/2004 08/03/2004 N 1697959 1697959
5, 28023 09/03/2004 11/03/2004 N 1697959 1697959
6,28023 12/03/2004 29/12/2004 N 1697959 1697959
7, 28023 30/12/2004 07/03/2007 N 26427 5683717
8, 28023 08/03/2007 02/04/2007 N 26427 26427
9, 28023 03/04/2007 18/04/2007 N 26427 5683717
10, 28023 19/04/2007 04/12/2009 N 26427 26427
11, 28023 05/12/2009 31/12/2999 Y -3 -3

Rows 1, 2 have no changes for org and unit columns so I need to pick the latest active_to and close the first record.

ie., sk, bk, active_frm , active_to , flag, org, unit
100 28023 29/08/2003 10/12/2003 N 25598 25598

Rows 3, 4,5,6 have no changes for org and unit columns so I need to pick the latest active_to and close the first record.

101 28023 11/12/2003 29/12/2004 N 1697959 1697959

The rows 7,8,9,10,11 are correct. So I just have to insert them as they are.

My target should look like:

sk, bk, active_frm , active_to , flag, org, unit
100 28023 29/08/2003 10/12/2003 N 25598 25598
101 28023 11/12/2003 29/12/2004 N 1697959 1697959
102 28023 30/12/2004 07/03/2007 N 26427 5683717
103 28023 08/03/2007 02/04/2007 N 26427 26427
104 28023 03/04/2007 18/04/2007 N 26427 5683717
105 28023 19/04/2007 04/12/2009 N 26427 26427
106 28023 05/12/2009 31/12/2999 Y -3 -3

What I did:
1. Created checksum for bk, org , unit and used stage variables.
I was able to achieve if the group of same records are next to each other. But using this approach for rows 7,9 also get same checksum and spoils my approach.

Any advise is greatly appreciated.

Thanks in advance.

ArndW · Post by **ArndW** » Thu Dec 18, 2014 1:26 am

You are doing a variant on data historisation here. The checksum on your change-data columns just needs to compare to the previous record in the same group and you are set.

If your input is sorted on the key (column "bk") and by the effective date (column "active_frm") then you would create your checksum column from values in columns "flag", "org" and "unit" and compare it with the checksum of the previous record only.

One approach to doing this is using a transform stage to keep the last row and do comparisons using stage variables. Compare the current row's hash to that of the previous row to detect value changes.

Another option would be use the Historisation Operator but that is a separate (non-free) product and is certainly overkill for solving just this task.

Post by **rschirm** » Mon Dec 22, 2014 11:23 pm

You might address it this way.

1. Read data in and sort by bk,org,unit,flag(maybe)
2. feed to Aggregator stage grouping by bk,org,unit,flag(maybe)
3. output grouping values of bk,org,unit,flag(maybe),Min(active_frm),Max(active_to)
4. add seregate key and write to output.

priyadarshikunal · Post by **priyadarshikunal** » Tue Dec 23, 2014 7:50 am

I would prefer the row by row comparison using stage variable as already mentioned by Arnd.