Page 1 of 1

Problem joining four tables

Posted: Thu Dec 07, 2017 4:00 pm
by perspicax
New to DS. I am trying join 4 tables using the Join stage. The table structure are as follows

Tab_A
ID1 -->Key
xyz

TAB_B
ID1 --> Common Key for B & A
ID2 --> Common Key for B & C
ID3 --> Common Key for B & D
lmn

TAB_C
ID2 -->Key
pqr

TAB_D
ID3 -->
abc

In SQL we would join as follows

Select
*
from
TAB_A A, TAB_B B, TAB_C C, TAB_D D
where A.ID1= B.ID1
and B.ID2 = C.ID2
and B.ID3 = C.ID3

So Under properties of Join stage, I added 3 Repetitive Key as it is a 4 table join. I do not see any keys to chose from drop down to choose key for each one. So I manually type the column name to be used as Join key for each key.

But this throws following error when I run the job:

Join_XX: Error when checking operator: Key field "ID1" was not found in the view-adapted input schema

There are multiple errors like above for each key column.

Is there anything I am doing wrong?

Please let me know

Thanks

Posted: Thu Dec 07, 2017 5:20 pm
by chulett
Do the join in the source SQL just like you posted. The Join stage is really meant for disparate sources or multiple databases where the join needs to happen inside the job. Unless this is just an educational exercise?

Posted: Thu Dec 07, 2017 5:37 pm
by perspicax
Two tables are from Oracle, 1 table from SQL server and one is a file. So these are disparate sources. I am trying to achieve using Join operator/stage what the above sql would do

Posted: Fri Dec 08, 2017 5:50 am
by thompsonp
Your sql is wrong.
and B.ID3 = C.ID3 should be and B.ID3 = D.ID3

back to DataStage and are you trying to do all this is one join stage?

The inputs need to be partitioned and sorted correctly so it's not possible to mix several joins using different keys in one join stage. You'll need 3 join stages with one key column in each.

Either use multiple join stages or depending on the volume of data you might be better using lookups to avoid having to repartition and sort the data several times.

Posted: Fri Dec 08, 2017 4:33 pm
by perspicax
Yes you are correct about the join.

Yes, I was able to make it work with 3 join stages. So we cannot use one join stage at all for this scenario?

When you say, input needs to be partitioned and sorted, do you mean, Choosing a 'collector type' (one of ordered, roundrobin, sort merge) for each input link and enabling 'perform sort' check box?

Posted: Fri Dec 08, 2017 8:03 pm
by ray.wurlod
You can use one Join stage, but the Join key columns must be identically named (and have the same data type) to be eligible. You could use alias names in the extraction SQL to achieve this, or a Copy or Modify stage in the data streams.

Posted: Mon Dec 11, 2017 3:37 pm
by perspicax
My scenario doesn't work then? The join key column is different for different set of tables. For A & B it is ID1 which is identically named and have same data types, similarly for B & C it is ID2 and B & D is ID3.

The single join stage on more than two tables work only when all the tables involved are being joined on one common key?

Thanks

Posted: Mon Dec 11, 2017 7:59 pm
by chulett
Correct. And Ray noted the details for that.