I join a Seq file with another Seq file using join stage and in transformer I check IsNull(SeqFile2.Col1) to find out if
the lookup is found or not and it's not working. Do we need to do this in a different way when we join a seq file with
another seq file ? All columns in my lookup seq file is Not Nullable.
All my lookup columns are Varchar and funny thing about this is that the field is defined as Not Nullable in Database and the job did not fail but there are no values in the column when you query the table.
Correction in my job design that final stage is DB2 connector not a file.
In a database, an empty string is not a null. So if your job is writing empty strings when the join fails, and to be honest I can't remember if that is the case for string types, then you won't get "is not nullable" errors.
If a varchar from a failed join is blank rather than NULL, then you need to pick... or or introduce... a column that cannot be blank or zero in order to check whether the join worked or not. I was lucky on the last project in that all internal data records had a surrogate key that could never be zero so we just used that, and I always introduced a surrogate rownumber key on supplier input data that had not been loaded into the data yet. I used the Sequental File automatic row number column, but had to add 1 to it because it numbered rows starting at 0 (bad idea, IBM!)
Thanks for the reply. I understand the difference between empty string and null but my question is that when we don't find a matching record in the right file what do we get ? As I mentioned in my original post all columns are not nullable so I don't think introducing another column with some default value is going to make a difference. When I query the database I don't find any empty space. Could be some unprintable character but how come when I don't find a look up it returns un printable characters ? Am I misiing something here ?
There's nothing better than using some debugging techniques. One I've used is to have failed joins go to a reject link and file, so I can examine the data for reasons.
You didn't mention what type of join you are using, and you use the term "lookup" in your posts. Is it possible that you might have success with using a Lookup stage instead of Join?
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
If you declare your numeric columns coming from the sequential files as being nullable (even though they aren't), then after your left outer join you will get null values for the joined columns when the join doesn't find a match.
I don't think that's correct. I've always got 0 for a failed join even when the column is defined as nullable. What you describe may be correct for Server jobs, I'm a bit rusty on them, but not for Parallel.
Last edited by PhilHibbs on Fri Oct 05, 2012 4:01 am, edited 1 time in total.
Phil - it is correct; I just wrote a test job to make sure that it is so.
The SQL logic of a left outer states that for each left input row there will be an output row, those fields coming from the right side of the join when unmatched are set to <null>. If, in DataStage, this field is not nullable then the default value is used, i.e. 0 for numbers, "" for strings. If the fields are nullable then null values are used. The same logic and methodology applies to lookup stages as well.
Then there is something very strange going on here because I am sure that that is not how it behaved on my last project. Could some configuration option affect this behaviour?
No, not as far as I know. I've used this behaviour for years with joins/lookups to detect if there was a match -- I'd use a non-nullable column, declare it as nullable on the input link to the join stage and, of course, in the output link and later test with IsNull() in a transform stage. The only time it will be a null value is if the join/lookup didn't find a match.
I could not figure out the explanation for the behaviour so as a workaround I checked the column from the right side file in stage variable using NullToZero function and then checked the stage variable in the column derivation using IsAlpha function. It worked