RCP - CDC Stage - Partitoning and Sorting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

RCP - CDC Stage - Partitoning and Sorting

Post by koti9 »

Hello all,

We are trying to do one single job with CDC Stage(comparing file and db) in it and also RCP Enabled. Did anyone try that, if yes, how did you handle the Hash Partitioning and sorting in it? Any help would be greatly appreciated.

Thanks & Regards
Koti
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

That most likely depends on your plan for handling a variable number of key fields.

I opted for using a single concatenated key called KEY, so the partitioning and sorting was simply by KEY.

I found a generic RCP-enabled shared container to be more useful than a generic RCP-enabled job. Gives you more flexibility on prepping the new incoming data, especially if you need to do complex transformations.

Mike
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I second the thought of having single concatanated field for key. You can even have the key field location or start end position parameterized.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

viewtopic.php?t=153932

This post should give you little more detail on how to do it for files.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Hi Mike,

Thanks for your input, I too have dynamic number of keys based on table, and i want to follow your approach, exactly how did you generate the concatenated key (Through Parameter Set Values???), if yes , can you give me more details. Just to make things spicy, we want it be NLS Compatiable as we get different languages..So all three in one job(RCP,NLS and using CDC Stage)


Thanks & Regards
Koti
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

If you have a dynamic list of keys, then you need to be able to identify those key fields. Using your DBs system tables should allow you to do that.

Using the system tables to identify the key values you can then generate an SQL statement which concatenates those values into a single output column as well as naming any other fields that you require. You then pass that SQL as either a parameter, or write it to a file and pass the file name to your DB stage.

As long as your job is set up to receive those foreign language chars (ie NLS is enabled, you're using NVarChar or Varchar (Unicode), using the correct character set mapping etc) then NLS should not be an issue.
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Hi Shane/Mike,

Once again Thanks for your responses,
I got a break through with CDC and RCP.But it is working only with one key field(that i pass through parameter set). Here is my design.

Code: Select all

SeqFile----Sort-----CDC----Transformer----Modify-----Oralce
                     |
                     |
                     |
                   Oracle
I was able to pass the sort keys, CDC Keys via (parameter set value files),
But if i am trying for multiple use multiple keys(with nulls in some of them), Its failing.As long as the parameter set value file has some column in varaible name, its working. Did you try this method of RCP??

Let me know your thoughts...I can provide the error message also...i already searched error the forum for the same, with zero hits...

Thanks & Regards
Koti
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

<edited your post to add code tags to preserve whitespace>
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Nice job so far Koti.

I tend to like using parameter set values files to support my generic multi-instance RCP jobs since it provides a nice container for organizing everything that needs parameterizing.

Now you can appreciate the challenge of handling a variable number of keys. And you also have the additional requirement of handling potential NULL key values.

Let me ask a question to start the thought process for handling your keys:
What is the maximum number of keys that you need to be able to support?

Mike
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Thanks Mike,

We are planning max of 5 keys as of now...


Thanks
Koti
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Can you just quickly explain how you are setting your key value field for both the DB and the Seq file?

Are you passing a set of column names from a parameter set?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And Koti, when you do - please stop quoting everything you are replying to. Save that for when it makes sense, otherwise learn to love the Reply to topic link. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Sorry Craig, Will follow that...


Shane,

Thats right i am sending the Key columns via parameter set value file.

I was able get to a temporary work around, i am passing separate schema file just with keys and using ColumnExport Stage to concatenate dynamically.....As i can't mention the key in my original schema file...
So my Sequential File have original schema file and Column Export stage have another schema file with keys.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

koti9 wrote:As i can't mention the key in my original schema file
Sorry, but... why can you not?
-craig

"You can never have too many knives" -- Logan Nine Fingers
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Hi Craig,

There is no provision to mention the key in the schema file.
I think it is like that for a while....


Thanks & Regards
Koti
Post Reply