DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
phanikumar
Participant



Joined: 20 Sep 2011
Posts: 60
Location: INDIA
Points: 536

Post Posted: Thu Jun 07, 2018 4:18 pm Reply with quote    Back to top    

DataStage® Release: 9x
Job Type: Parallel
OS: Unix
Hi All,

Can CDC stage handle duplicates.. We have a scenario where duplicate values coming in for KEY column.. The job produce different results upon running multiple times..

The job does join on KEY column and produce multiple records for the same key column.. But the change codes are not consistent when run multiple times..

The KEY column is HASH partitioned and sorted from both the links..

Can any one please help me understand why would it produce different resluts..

Regards
Kumar
rbk
Participant



Joined: 23 Oct 2013
Posts: 23
Location: India
Points: 414

Post Posted: Fri Jun 08, 2018 1:09 am Reply with quote    Back to top    

Not sure but this is a scenario that I have noticed as well.

I have noticed in cases where we have duplicates in the source (after), the first record gets identified as a copy (assuming the data is available in the reference/before as well) and the second record gets identified as an insert. Not sure why it does that. Would be nice to get an understanding of how exactly the CDC stage works. Also I think it is better to not have duplicates in the source and reference considering that we are trying to identify the changes. Do let us know if you come across any solutions...

_________________
Cheers,
RBK
Rate this response:  
Not yet rated
Mike



Group memberships:
Premium Members

Joined: 03 Mar 2002
Posts: 1017
Location: Omaha, NE
Points: 6551

Post Posted: Fri Jun 08, 2018 4:23 pm Reply with quote    Back to top    

I'm not sure if this is documented or if it is just something I know from experience.

The change capture stage requires unique keys on its inputs.

This makes perfect sense if you think about the classic two file match logic that probably happens under the covers where a key match results in the next record from each file being read before the next key comparison.

Having said that... it is still possible to handle multiple version changes for a given key in a single job execution utilizing the change capture stage.

It just takes a little creativity to turn the duplicate keys into the unique keys that the stage requires.

Mike
Rate this response:  
Not yet rated
rameshrr3



Group memberships:
Premium Members

Joined: 10 May 2004
Posts: 609
Location: BRENTWOOD, TN
Points: 6937

Post Posted: Wed Jun 13, 2018 2:13 pm Reply with quote    Back to top    

IDK why its common to refer to Change Capture stage as CDC stage, because it creates quite a confusion with the CDC Transaction Stage.
Rate this response:  
Not yet rated
qt_ky



Group memberships:
Premium Members

Joined: 03 Aug 2011
Posts: 2819
Location: USA
Points: 21356

Post Posted: Thu Jun 14, 2018 5:07 am Reply with quote    Back to top    

I agree. It is a common misnomer. Clearly, the Change Capture stage would be abbreviated CC. CDC is different.

_________________
Choose a job you love, and you will never have to work a day in your life. - Confucius
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42765
Location: Denver, CO
Points: 220367

Post Posted: Thu Jun 14, 2018 7:18 am Reply with quote    Back to top    

So basically it's CDD or Change Data Detection? That's what I've known it as and as noted it's a distinctly different process than Change Data Capture.

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Not yet rated
qt_ky



Group memberships:
Premium Members

Joined: 03 Aug 2011
Posts: 2819
Location: USA
Points: 21356

Post Posted: Fri Jun 15, 2018 5:33 am Reply with quote    Back to top    

Yes, the Change Capture stage performs change data detection, but watch out... because "CDD" is another IBM product acronym for Change Data Delivery! Shocked

_________________
Choose a job you love, and you will never have to work a day in your life. - Confucius
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42765
Location: Denver, CO
Points: 220367

Post Posted: Fri Jun 15, 2018 6:21 am Reply with quote    Back to top    

Great. Now we need ACD - Acronym Collision Detection.

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours