Hi All,
Can CDC stage handle duplicates.. We have a scenario where duplicate values coming in for KEY column.. The job produce different results upon running multiple times..
The job does join on KEY column and produce multiple records for the same key column.. But the change codes are not consistent when run multiple times..
The KEY column is HASH partitioned and sorted from both the links..
Can any one please help me understand why would it produce different resluts..
Regards
Kumar
Duplicate Key values in CDC Stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 60
- Joined: Tue Sep 20, 2011 10:44 pm
- Location: INDIA
Not sure but this is a scenario that I have noticed as well.
I have noticed in cases where we have duplicates in the source (after), the first record gets identified as a copy (assuming the data is available in the reference/before as well) and the second record gets identified as an insert. Not sure why it does that. Would be nice to get an understanding of how exactly the CDC stage works. Also I think it is better to not have duplicates in the source and reference considering that we are trying to identify the changes. Do let us know if you come across any solutions...
I have noticed in cases where we have duplicates in the source (after), the first record gets identified as a copy (assuming the data is available in the reference/before as well) and the second record gets identified as an insert. Not sure why it does that. Would be nice to get an understanding of how exactly the CDC stage works. Also I think it is better to not have duplicates in the source and reference considering that we are trying to identify the changes. Do let us know if you come across any solutions...
Cheers,
RBK
RBK
I'm not sure if this is documented or if it is just something I know from experience.
The change capture stage requires unique keys on its inputs.
This makes perfect sense if you think about the classic two file match logic that probably happens under the covers where a key match results in the next record from each file being read before the next key comparison.
Having said that... it is still possible to handle multiple version changes for a given key in a single job execution utilizing the change capture stage.
It just takes a little creativity to turn the duplicate keys into the unique keys that the stage requires.
Mike
The change capture stage requires unique keys on its inputs.
This makes perfect sense if you think about the classic two file match logic that probably happens under the covers where a key match results in the next record from each file being read before the next key comparison.
Having said that... it is still possible to handle multiple version changes for a given key in a single job execution utilizing the change capture stage.
It just takes a little creativity to turn the duplicate keys into the unique keys that the stage requires.
Mike
CDC or Change Capture
IDK why its common to refer to Change Capture stage as CDC stage, because it creates quite a confusion with the CDC Transaction Stage.