Sequential File to Seq File join behaviour...

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Sequential File to Seq File join behaviour...

Post by kaps »

I noticed something and want to make sure that my understanding is correct.

Job design is :

Code: Select all

                SeqFile2
	         	|
	         	|
SeqFile1---Join Stage----Transformer---Remove Dup----File
I join a Seq file with another Seq file using join stage and in transformer I check IsNull(SeqFile2.Col1) to find out if
the lookup is found or not and it's not working. Do we need to do this in a different way when we join a seq file with
another seq file ? All columns in my lookup seq file is Not Nullable.

Please advise.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Numeric values can't be NULL. If the join fails then the numeric values from SeqFile2 will be 0, not NULL. I think string values can be NULL.
Phil Hibbs | Capgemini
Technical Consultant
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

All my lookup columns are Varchar and funny thing about this is that the field is defined as Not Nullable in Database and the job did not fail but there are no values in the column when you query the table.

Correction in my job design that final stage is DB2 connector not a file.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

In a database, an empty string is not a null. So if your job is writing empty strings when the join fails, and to be honest I can't remember if that is the case for string types, then you won't get "is not nullable" errors.

If a varchar from a failed join is blank rather than NULL, then you need to pick... or or introduce... a column that cannot be blank or zero in order to check whether the join worked or not. I was lucky on the last project in that all internal data records had a surrogate key that could never be zero so we just used that, and I always introduced a surrogate rownumber key on supplier input data that had not been loaded into the data yet. I used the Sequental File automatic row number column, but had to add 1 to it because it numbered rows starting at 0 (bad idea, IBM!)
Phil Hibbs | Capgemini
Technical Consultant
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Thanks for the reply. I understand the difference between empty string and null but my question is that when we don't find a matching record in the right file what do we get ? As I mentioned in my original post all columns are not nullable so I don't think introducing another column with some default value is going to make a difference. When I query the database I don't find any empty space. Could be some unprintable character but how come when I don't find a look up it returns un printable characters ? Am I misiing something here ?
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

There's nothing better than using some debugging techniques. One I've used is to have failed joins go to a reject link and file, so I can examine the data for reasons.

You didn't mention what type of join you are using, and you use the term "lookup" in your posts. Is it possible that you might have success with using a Lookup stage instead of Join?
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Hi

1. What type of join you used?

2. Let us know the linking order?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Re: Sequential File to Seq File join behaviour...

Post by mouthou »

try the old fashiobned method of sorting the source and reference data before Join, and see the correctness of the lookup.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you declare your numeric columns coming from the sequential files as being nullable (even though they aren't), then after your left outer join you will get null values for the joined columns when the join doesn't find a match.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I don't think that's correct. I've always got 0 for a failed join even when the column is defined as nullable. What you describe may be correct for Server jobs, I'm a bit rusty on them, but not for Parallel.
Last edited by PhilHibbs on Fri Oct 05, 2012 4:01 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil - it is correct; I just wrote a test job to make sure that it is so.

The SQL logic of a left outer states that for each left input row there will be an output row, those fields coming from the right side of the join when unmatched are set to <null>. If, in DataStage, this field is not nullable then the default value is used, i.e. 0 for numbers, "" for strings. If the fields are nullable then null values are used. The same logic and methodology applies to lookup stages as well.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Then there is something very strange going on here because I am sure that that is not how it behaved on my last project. Could some configuration option affect this behaviour?
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, not as far as I know. I've used this behaviour for years with joins/lookups to detect if there was a match -- I'd use a non-nullable column, declare it as nullable on the input link to the join stage and, of course, in the output link and later test with IsNull() in a transform stage. The only time it will be a null value is if the join/lookup didn't find a match.
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

I could not figure out the explanation for the behaviour so as a workaround I checked the column from the right side file in stage variable using NullToZero function and then checked the stage variable in the column derivation using IsAlpha function. It worked :-)
Post Reply