Join Fails

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

yes, Ray, I have done partitioning and sorting on both the input links. any other suggestion you might have that I could verify? I can paste the osh incase if you want to take a look at it?

Thanks for your responses!!
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What are the constraints you have defined within the lookup? Specifically, the Lookup Failure constraint?

Have you verified that all rows are matching in the lookup? How?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

its defined to drop the rows when lookup failure. so, as to define logic as inner join.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

here is the osh of the join stage.

Code: Select all

 
## General options
[ident('oc_sap_corr_rcpi_cat_code'); jobmon_ident('oc_sap_corr_rcpi_cat_code')]
## Outputs
0> [] 'oc_sap_corr_rcpi_cat_code:ln_read_oracle_sap_corr_usage_codex.v'
;
#### STAGE: Join_785.DSLink723_Part
## Operator
hash -key KEY_FIELD -cs
## General options
[ident('Join_785.DSLink723_Part')]
## Inputs
0< [] 'Copy_788:DSLink723.v'
## Outputs
0> [] 'Copy_788:DSLink723_Part.v'
;
#### STAGE: Join_785.DSLink723_Sort
## Operator
tsort -key KEY_FIELD -cs -asc -nonStable
## General options
[ident('Join_785.DSLink723_Sort')]
## Inputs
0< [] 'Copy_788:DSLink723_Part.v'
## Outputs
0> [] 'Copy_788:DSLink723_Part_Sort.v'
;
#### STAGE: Join_785.DSLink721_Part
## Operator
hash -key KEY_FIELD -cs
## General options
[ident('Join_785.DSLink721_Part')]
## Inputs
0< [] 'Copy_791:DSLink721.v'
## Outputs
0> [] 'Copy_791:DSLink721_Part.v'
;
#### STAGE: Join_785.DSLink721_Sort
## Operator
tsort -key KEY_FIELD -cs -asc -nonStable
## General options
[ident('Join_785.DSLink721_Sort')]
## Inputs
0< [] 'Copy_791:DSLink721_Part.v'
## Outputs
0> [] 'Copy_791:DSLink721_Part_Sort.v'
;
#################################################################
#### STAGE: Join_785
## Operator
innerjoin
## Operator options
-key 'KEY_FIELD'
## General options
[ident('Join_785'); jobmon_ident('Join_785')]
## Inputs
0< [] 'Copy_788:DSLink723_Part_Sort.v'
1< [] 'Copy_791:DSLink721_Part_Sort.v'
## Outputs
0> [modify (
keep
***********  ALL COLUMNS**********
)] 'Join_785:ln_data_1.v'
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Nothing to do with stage and its relates to data is my guess.

You haven't confirmed whether the source is file / db / ??

For test purpose, write hardly 2 rows of data into dataset files and then use that files.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

This is happening due to the oracle connector stage that I am using. when I join the data from oracle connector & dataset it fails for most of the records as I have described earlier. when I do the same with a dataset in both the input links it works fine. so, not sure what is not right with oracle connector stage. Although, I have taken out the spaces using trim.

could this be a resource issue?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

When you use a lookup instead of a join and are matching all rows, are you using datasets only or the Oracle connector? What is the key type you are using in the lookup? Are you using a sparse or normal lookup?

When you use datasets only, where did the data in the datasets come from? Is it the source data that was loaded to Oracle, or is it data that was extracted from Oracle? What about key nullability? If your dataset key is non-nullable, but the Oracle key column is nullable, you may need to handle nulls when reading from Oracle, even if you specify non-nullable in the column metadata.

A join stage requires that the key values are identical (byte-for-byte) and in the same partition in order for one row to match another. If you are trimming keys, are you doing so BEFORE you partition and sort the rows?

You may find that you will need to manually compare the keys from two rows you believe should have matched and didn't to discover that the difference is.

Regards,


Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

Well, when I am running in single node config it's still producing the same results. I have verified the data byte to byte and it is matching. one more issue I could see when I am reading a dataset from the same job it reads around 4 million records only when I remove the other stages and just try to read it, reads around 4.9 million records. has this anything to do with the memory as I can see the memory usage is peeking to 99.6% and also it is joining only 25K records only.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

If you are certain that all of the rows which should match are in both inputs to the join, then you have one of the following situations:

1) Data is out of order
2) Data is incorrectly partitioned (apparently moot as you have the same results in a single-node configuration)
3) Data does not match exactly (all data, whitespace included)

As #2 is apparently eliminated, #1 and #3 are left to continue investigating. The questions I asked in my earlier posts are meant to help both identify and eliminate potential causes.

Join will only read an input when there is the possibility of a match to another input (i.e. key value on input 1 is < key value on input 2, therefore read input 1 until key1>=key2). Outer joins will read all rows from the Left (left outer), Right (right outer) or all (full outer) inputs, but in the case of let or right outer, may not read all of the other inputs (depending on key values). An inner join may quit reading when one of the inputs hits EOF as there is no longer a chance of a match to the other inputs.

Running out of memory can cause your jobs to run slower as processes are paged in and out of swap, as well as potentially causing failures due to unavailable resources.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

Yes, James I am aware of all the 3 possibilities. strangely when I replace the join with lookup it works fine. and second thing in that environment is when I try to read the dataset from one job it reads fewer records than the dataset actually holding. and with another job it reads the right number of records. not sure why it stops reading the records and joining when it should be doing it.

Could it be that the source data is read very slowly at the rate of 100 rows/sec to read around 5 Million records causing the join to fail?
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

ahmedwaseem2000 wrote:when I try to read the dataset from one job it reads fewer records than the dataset actually holding. and with another job it reads the right number of records. not sure why it stops reading the records and joining when it should be doing it.
Go to the Dataset mamagement / use admin command to check how many records dose the dataset have?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

I have done that through orchadmin. the number of records displayed while writing to ds and reading from one of the job is tallying and another job is reading fewer records than the total counts
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

ahmedwaseem2000 wrote:another job is reading fewer records than the total counts
Did the job reading the data from the same file?

I am not sure, but is it relates to any parition?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
ahmedwaseem2000
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 15, 2006 12:23 pm

Post by ahmedwaseem2000 »

All, I can think of is the resouces faltering. not sure what is causing this though.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

If it is relates to resource issue, it may lead to different error and that what i faced in my env.

Wrong partition kind of issue will lead you to wrong (counts) results.

To test the join, use the same table as source and reference and run the job. Let see what is happening.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Post Reply