What is the best practice for joining Nullable keys

balajimadhav · Post by **balajimadhav** » Tue Jul 22, 2014 3:49 am

Hello,

I have a job design where source and target are oracle tables. There are two additional tables to be joined with the source. Job design is source ->join1->join2->copy->target. In both the joins the key column identified is nullable and the existing sample records has no null values in the data though these join keys has been defined as nulls. At the pre join step, data has been partitioned (hash) and sorted as per the join keys in each join stage as the join key varies between join1 and join2.

I would like to clarify how to handle this case as per the best practice as i have applied modify stage on both links of join considering the rule that key column should not be null.Point to be noted is the target column to be mapped is also null in target.

Also this scenario can be handled without modify stage keeping all keys right from source to target as nullable and i get the same result in both cases (with and without modify stages). My assumption is DS will drop records with null values in join keys and even i tried to simulate the join keys as null and still i could see the data is processing successful

Recommending the table definition to be changed for nullability of key columns is not possible in current scenario and hence no changes could be made to tables. Expected data volume in each table is around 100K records in each table

please share your thoughts. i have checked all previous mail threads but couldnt get this clarified and hence created a new post

chulett · Post by **chulett** » Tue Jul 22, 2014 7:18 am

In my experience, one would be converting any NULL values to an in-band value that you know doesn't naturally occur in the data before the join and then converting them back afterwards. Assuming you want the NULL values to be considered equal and be joined.

balajimadhav · Post by **balajimadhav** » Tue Jul 22, 2014 7:44 am

Thanks for clarifying. As i understand in case if null values not in scope for my join then the null to not null conversion is not required. Also how does DS handles when two nullable keys with data being NULL when joined? i observed that it joins and produces the result as NULL. shouldn't it drop the record?

ray.wurlod · Post by **ray.wurlod** » Tue Jul 22, 2014 4:49 pm

Please note, too, that sometimes you have to specify nullable, for example when the input link is the outer side of an outer join or if you are effecting a full outer join.