Join Stage omitting data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hrthomson
Participant
Posts: 25
Joined: Fri Aug 20, 2004 9:57 am
Location: Ottawa, Canada

Join Stage omitting data

Post by hrthomson »

I have a job that works in our UA environment but since we've moved to Prod it's causing us issues. I am using a Join stage to bring together data from several datasets. For the last column, despite doing a successful look-up, it's putting NULL (which we've replaced with asterisks).

Has anyone seen this issue before? We're doing a left-outer join with approx 743k records and 11 data sources (2 flat files and 1 data set).
what a man is contributes much more to his happiness than what he has, or how he is regarded by others.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Any kind of outer join can legitimately return null when there's no key on the probe input corresponding to the current key on the driver input. The difference is likely to be in your data rather than anywhere else.

Be very sure that your data are partitioned and sorted correctly (and that APT_NO_SORT_INSERTION is not set).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

Join Stage omitting data

Post by ajay.vaidyanathan »

Hi,
As Ray has rightly said, any OUTER JOIN will fetch you NULL if you are referrening to a particular column from the Reference Table (Table on which you are performing the Outer Join).

If you already have a Transformer Stage after you have performed your Outer Join, then probably use a NullToValue() for the particular column you are referring to from the Reference Table*

Then once you have got your "CONVERTED" value, use it for your future purposes (Whatever it is - Insertion,Transformation, etc...)
Regards
Ajay
hrthomson
Participant
Posts: 25
Joined: Fri Aug 20, 2004 9:57 am
Location: Ottawa, Canada

Post by hrthomson »

Thanks guys, but the look ups should not be turning nulls. There are legit values in the fields for the key. I ended up having to split my data set into smaller data sets - one for each field (so structure was Key Column, Value Column). Bizarre. I tried playing with the parallel and sequential settings to no avail.

This join worked fine with a smaller set of data, but failed with our production data set of 700k + rows. Could it be a resource issue?
what a man is contributes much more to his happiness than what he has, or how he is regarded by others.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

More likely to be a problem in a row whose row number is more than 700K.
What's the data type of the key? If VarChar, are they identically trimmed? If not, are the data types identical on stream and reference inputs?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply