lookup where input and output schema will be undefined.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wblack
Premium Member
Premium Member
Posts: 30
Joined: Thu Sep 23, 2010 7:55 am

lookup where input and output schema will be undefined.

Post by wblack »

We have a situation where we want to use a database lookup table as a reference in a Datastage job. We want to create a module where the input and output schema will be undefined, with exception to the input field(s) used in the lookup and the resulting output(s) fields. We have attempted to do this using a shared container with a lookup stage, but we are forced to define the input and output schemas. Next, we tried developing a parallel routine and custom operator but there's no built-in way to interface with a database table. Can you tell us what the best practice(s) are to accomplish this?
William Black
blewip
Participant
Posts: 81
Joined: Wed Nov 10, 2004 10:55 am
Location: London

Re: lookup where input and output schema will be undefined.

Post by blewip »

Well RCP should do this for a Join and I assume a lookup.

You will need to define the lookup keys, however the rest can be RCP'd through.
Modern Life is Rubbish - Blur
wblack
Premium Member
Premium Member
Posts: 30
Joined: Thu Sep 23, 2010 7:55 am

Post by wblack »

You are saying that Runtime Column Propagation will allow you to pass through all columns where the input and output schema isn't known even if the job is only working on a select number of columns? Does the columns have to be the same though out the entire job to use RCP? Does the entire job have to be set to use RCP or can only a portion of a job use it?
William Black
blewip
Participant
Posts: 81
Joined: Wed Nov 10, 2004 10:55 am
Location: London

Post by blewip »

You can set the job to RCP and then only use it when you need to.

Therefore you could explicitly define some columns at the start (but you could use RCP to pick up the rest / all of them). Then you can pick up extra columns as you go through the job, either explicitly defined or by RCP.
Modern Life is Rubbish - Blur
wblack
Premium Member
Premium Member
Posts: 30
Joined: Thu Sep 23, 2010 7:55 am

Post by wblack »

Ok I have a simple parallel job that consist of a row generator (two columns A and B, char length=1) that feeds into a shared container and then to a peek stage. In the shared container, I have an input, lookup stage, and an output where I try to match on a key (single char) to get back a value. Are you saying I don't have to explicitly specify the columns A and B going into the lookup stage if I use RCP?
William Black
Post Reply