Join Fails

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Sun Apr 22, 2012 8:38 pm

yes, Ray, I have done partitioning and sorting on both the input links. any other suggestion you might have that I could verify? I can paste the osh incase if you want to take a look at it?

Thanks for your responses!!

jwiles · Post by **jwiles** » Sun Apr 22, 2012 9:17 pm

What are the constraints you have defined within the lookup? Specifically, the Lookup Failure constraint?

Have you verified that all rows are matching in the lookup? How?

Regards,

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Sun Apr 22, 2012 9:20 pm

its defined to drop the rows when lookup failure. so, as to define logic as inner join.

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Sun Apr 22, 2012 9:54 pm

here is the osh of the join stage.

Code: Select all

 
## General options
[ident('oc_sap_corr_rcpi_cat_code'); jobmon_ident('oc_sap_corr_rcpi_cat_code')]
## Outputs
0> [] 'oc_sap_corr_rcpi_cat_code:ln_read_oracle_sap_corr_usage_codex.v'
;
#### STAGE: Join_785.DSLink723_Part
## Operator
hash -key KEY_FIELD -cs
## General options
[ident('Join_785.DSLink723_Part')]
## Inputs
0< [] 'Copy_788:DSLink723.v'
## Outputs
0> [] 'Copy_788:DSLink723_Part.v'
;
#### STAGE: Join_785.DSLink723_Sort
## Operator
tsort -key KEY_FIELD -cs -asc -nonStable
## General options
[ident('Join_785.DSLink723_Sort')]
## Inputs
0< [] 'Copy_788:DSLink723_Part.v'
## Outputs
0> [] 'Copy_788:DSLink723_Part_Sort.v'
;
#### STAGE: Join_785.DSLink721_Part
## Operator
hash -key KEY_FIELD -cs
## General options
[ident('Join_785.DSLink721_Part')]
## Inputs
0< [] 'Copy_791:DSLink721.v'
## Outputs
0> [] 'Copy_791:DSLink721_Part.v'
;
#### STAGE: Join_785.DSLink721_Sort
## Operator
tsort -key KEY_FIELD -cs -asc -nonStable
## General options
[ident('Join_785.DSLink721_Sort')]
## Inputs
0< [] 'Copy_791:DSLink721_Part.v'
## Outputs
0> [] 'Copy_791:DSLink721_Part_Sort.v'
;
#################################################################
#### STAGE: Join_785
## Operator
innerjoin
## Operator options
-key 'KEY_FIELD'
## General options
[ident('Join_785'); jobmon_ident('Join_785')]
## Inputs
0< [] 'Copy_788:DSLink723_Part_Sort.v'
1< [] 'Copy_791:DSLink721_Part_Sort.v'
## Outputs
0> [modify (
keep
***********  ALL COLUMNS**********
)] 'Join_785:ln_data_1.v'

SURA · Post by **SURA** » Sun Apr 22, 2012 10:11 pm

Nothing to do with stage and its relates to data is my guess.

You haven't confirmed whether the source is file / db / ??

For test purpose, write hardly 2 rows of data into dataset files and then use that files.

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Tue Apr 24, 2012 12:53 am

This is happening due to the oracle connector stage that I am using. when I join the data from oracle connector & dataset it fails for most of the records as I have described earlier. when I do the same with a dataset in both the input links it works fine. so, not sure what is not right with oracle connector stage. Although, I have taken out the spaces using trim.

could this be a resource issue?

jwiles · Post by **jwiles** » Tue Apr 24, 2012 8:56 am

When you use a lookup instead of a join and are matching all rows, are you using datasets only or the Oracle connector? What is the key type you are using in the lookup? Are you using a sparse or normal lookup?

When you use datasets only, where did the data in the datasets come from? Is it the source data that was loaded to Oracle, or is it data that was extracted from Oracle? What about key nullability? If your dataset key is non-nullable, but the Oracle key column is nullable, you may need to handle nulls when reading from Oracle, even if you specify non-nullable in the column metadata.

A join stage requires that the key values are identical (byte-for-byte) and in the same partition in order for one row to match another. If you are trimming keys, are you doing so BEFORE you partition and sort the rows?

You may find that you will need to manually compare the keys from two rows you believe should have matched and didn't to discover that the difference is.

Regards,

Regards,

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Wed Apr 25, 2012 8:16 pm

Well, when I am running in single node config it's still producing the same results. I have verified the data byte to byte and it is matching. one more issue I could see when I am reading a dataset from the same job it reads around 4 million records only when I remove the other stages and just try to read it, reads around 4.9 million records. has this anything to do with the memory as I can see the memory usage is peeking to 99.6% and also it is joining only 25K records only.

jwiles · Post by **jwiles** » Thu Apr 26, 2012 9:08 am

If you are certain that all of the rows which should match are in both inputs to the join, then you have one of the following situations:

1) Data is out of order
2) Data is incorrectly partitioned (apparently moot as you have the same results in a single-node configuration)
3) Data does not match exactly (all data, whitespace included)

As #2 is apparently eliminated, #1 and #3 are left to continue investigating. The questions I asked in my earlier posts are meant to help both identify and eliminate potential causes.

Join will only read an input when there is the possibility of a match to another input (i.e. key value on input 1 is < key value on input 2, therefore read input 1 until key1>=key2). Outer joins will read all rows from the Left (left outer), Right (right outer) or all (full outer) inputs, but in the case of let or right outer, may not read all of the other inputs (depending on key values). An inner join may quit reading when one of the inputs hits EOF as there is no longer a chance of a match to the other inputs.

Running out of memory can cause your jobs to run slower as processes are paged in and out of swap, as well as potentially causing failures due to unavailable resources.

Regards,

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Sun Apr 29, 2012 9:58 pm

Yes, James I am aware of all the 3 possibilities. strangely when I replace the join with lookup it works fine. and second thing in that environment is when I try to read the dataset from one job it reads fewer records than the dataset actually holding. and with another job it reads the right number of records. not sure why it stops reading the records and joining when it should be doing it.

Could it be that the source data is read very slowly at the rate of 100 rows/sec to read around 5 Million records causing the join to fail?

SURA · Post by **SURA** » Sun Apr 29, 2012 11:21 pm

ahmedwaseem2000 wrote:when I try to read the dataset from one job it reads fewer records than the dataset actually holding. and with another job it reads the right number of records. not sure why it stops reading the records and joining when it should be doing it.

Go to the Dataset mamagement / use admin command to check how many records dose the dataset have?

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Sun Apr 29, 2012 11:45 pm

I have done that through orchadmin. the number of records displayed while writing to ds and reading from one of the job is tallying and another job is reading fewer records than the total counts

SURA · Post by **SURA** » Mon Apr 30, 2012 12:31 am

ahmedwaseem2000 wrote:another job is reading fewer records than the total counts

Did the job reading the data from the same file?

I am not sure, but is it relates to any parition?

ahmedwaseem2000 · Post by **ahmedwaseem2000** » Mon Apr 30, 2012 12:36 am

All, I can think of is the resouces faltering. not sure what is causing this though.

SURA · Post by **SURA** » Mon Apr 30, 2012 12:43 am

If it is relates to resource issue, it may lead to different error and that what i faced in my env.

Wrong partition kind of issue will lead you to wrong (counts) results.

To test the join, use the same table as source and reference and run the job. Let see what is happening.