Join Stage omitting data

hrthomson · Post by **hrthomson** » Thu May 06, 2010 11:15 am

I have a job that works in our UA environment but since we've moved to Prod it's causing us issues. I am using a Join stage to bring together data from several datasets. For the last column, despite doing a successful look-up, it's putting NULL (which we've replaced with asterisks).

Has anyone seen this issue before? We're doing a left-outer join with approx 743k records and 11 data sources (2 flat files and 1 data set).

ray.wurlod · Post by **ray.wurlod** » Thu May 06, 2010 4:55 pm

Any kind of outer join can legitimately return null when there's no key on the probe input corresponding to the current key on the driver input. The difference is likely to be in your data rather than anywhere else.

Be very sure that your data are partitioned and sorted correctly (and that APT_NO_SORT_INSERTION is not set).

ajay.vaidyanathan · Post by **ajay.vaidyanathan** » Fri May 07, 2010 6:32 am

Hi,
As Ray has rightly said, any OUTER JOIN will fetch you NULL if you are referrening to a particular column from the Reference Table (Table on which you are performing the Outer Join).

If you already have a Transformer Stage after you have performed your Outer Join, then probably use a NullToValue() for the particular column you are referring to from the Reference Table*

Then once you have got your "CONVERTED" value, use it for your future purposes (Whatever it is - Insertion,Transformation, etc...)

hrthomson · Post by **hrthomson** » Fri May 07, 2010 7:42 am

Thanks guys, but the look ups should not be turning nulls. There are legit values in the fields for the key. I ended up having to split my data set into smaller data sets - one for each field (so structure was Key Column, Value Column). Bizarre. I tried playing with the parallel and sequential settings to no avail.

This join worked fine with a smaller set of data, but failed with our production data set of 700k + rows. Could it be a resource issue?

ray.wurlod · Post by **ray.wurlod** » Fri May 07, 2010 5:04 pm

More likely to be a problem in a row whose row number is more than 700K.
What's the data type of the key? If VarChar, are they identically trimmed? If not, are the data types identical on stream and reference inputs?