joining key columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Re: joning key columns

Post by kumar_s »

dxp wrote:
ray.wurlod wrote:While it's true that the Join stage can have more than two input links, they do require a common key, as track_star has pointed out.
.
Hi ray.wurlod,

is there any specific reason why this facility is not provided in join stage.
because right now in my work, i frequently encounter this situation of joining 3 or more tables. i have to use no. of joins in a job. and this will affect the performance, because no. of database connections will be more as join stages increase.

any thoughts!!?
Yes you can try inserting into Enhancment wish list. But i guess still the database connection remains 3 :roll:

regards
kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: joning key columns

Post by ray.wurlod »

dxp wrote:is there any specific reason why this facility is not provided in join stage.
You'd have to ask that of the manufacturer.
Have you considered a Lookup stage with two reference inputs as an alternative?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dxp
Charter Member
Charter Member
Posts: 48
Joined: Sat Oct 01, 2005 10:23 am

Re: joning key columns

Post by dxp »

ray.wurlod wrote:
dxp wrote:is there any specific reason why this facility is not provided in join stage.
You'd have to ask that of the manufacturer.
Have you considered a Lookup stage with two reference inputs as an alternative?

yes, considered. but if the data is huge from query, then it will become very expensive from memory point of view, because in lookup, whole data should be in cache before performing operations.

and if i want to use 'Left Only' join...then additional 2 stages i have to use.
where as, if i use join stage, only one extra 'filter' stage is enough.
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Personally, I think that the cascaded joins is the way to go, but you can try using the merge operator if you don't want to go that route. It can give you the same result set as the lookup and join, and uses about the same memory footprint that the join stage does. As Ray said, you can also try using a lookup operator. Be cautious with it though, as the lookup does a memory map and can cause some memory-related issues.

The join stage will actually work with more than two inputs, it just isn't the way it was designed to work. I believe that it was modeled after standard database joins, which typically involve only a left and right input.
sun rays
Charter Member
Charter Member
Posts: 57
Joined: Wed Jun 08, 2005 3:35 pm
Location: Denver, CO

Re: joning key columns

Post by sun rays »

dxp wrote:
ray.wurlod wrote:While it's true that the Join stage can have more than two input links, they do require a common key, as track_star has pointed out.
.
Hi ray.wurlod,

is there any specific reason why this facility is not provided in join stage.
because right now in my work, i frequently encounter this situation of joining 3 or more tables. i have to use no. of joins in a job. and this will affect the performance, because no. of database connections will be more as join stages increase.

any thoughts!!?
May be because, if you use user defined query the database optimiser is resonsible for selecting which table to join first and which ones later.
But I don't think datastage is intelligent enough to know which execution order would result in great performance, so it is left to you, to decide the order of joins based on the number of records resulting in the output set of a join.
I guess the performance would increase if you first join the tables which results in less number of output records then use an other stage to join the result set of first join.
dxp
Charter Member
Charter Member
Posts: 48
Joined: Sat Oct 01, 2005 10:23 am

Post by dxp »

Hi All,

thank for the ideas.

anyway in my situation, i am using Teradata DB.
its limitation with DataStage is if the query is more thank 1Kilo Byte, then the job wont execute.
our team asked the Ascential people. they clarified that we need a patch to fix this bug.
any way what we are doing is, segmenting the total query with joins in 'userdefined sql'.


dxp.
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Why not just apply the patch?
dxp
Charter Member
Charter Member
Posts: 48
Joined: Sat Oct 01, 2005 10:23 am

Post by dxp »

track_star wrote:Why not just apply the patch?
Hi track_star,

this thread started with a different question. just read from start.
the question not about length of query but of joining A,B,C tables, of which A,B r having 'columnA' as key column and tables A,C having 'columnC' as key column.

:!:
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Indeed it did--I was simply responding to your statement that there is an issue with Teradata user-defined queries and the angle that was presented by Ray. It's a valid angle to take if the performance is there.

Another thing to check is the spelling and case of the column names on the input links of the join stage. The GUI looks for EXACT matches between the input links on the join stage to determine key columns.
dxp
Charter Member
Charter Member
Posts: 48
Joined: Sat Oct 01, 2005 10:23 am

Post by dxp »

track_star wrote: Another thing to check is the spelling and case of the column names on the input links of the join stage. The GUI looks for EXACT matches between the input links on the join stage to determine key columns.
yeah track_star,

' spelling and case of the column names ' is also a case i surprised when i first came to know that.


dxp.
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

This post is been interesting.

Please let forum know how the initial objective is achieved.
Arun
dxp
Charter Member
Charter Member
Posts: 48
Joined: Sat Oct 01, 2005 10:23 am

Post by dxp »

apraman wrote:This post is been interesting.

Please let forum know how the initial objective is achieved.
initial objective:------->

Teradata_Stage1 & Teradata_Stage2....are having 'ColumnA' as key column
Teradata_Stage1 & Teradata_Stage3....are having 'ColumnB' as key column

ColumnA and ColumnB are of different data types. they r not equal in any sense.

is it possible to join Teradata_Stage1, Teradata_Stage2 & Teradata_Stage3 using single Join Stage in Parallel jobs?

------------------------------------------------------;

straight answer:...........not possible using single Join Stage in Parallel Jobs.

--------------------------------------------------------;

alternative1:.............

join first 2 Teradata_Stages with ColumnA as common column.
then join this output with 3rd stage, keeping ColumnB as common column.


alternative2:..........
u can write 'userdefined sql' in one Teradata_stage, if the query size is less than 1Kb. if it is more than this size u have to apply a 'patch' from Teradata.
---------------------------------------------------------------------------------

dxp.
kanchub79
Participant
Posts: 2
Joined: Tue Oct 18, 2005 7:19 am

Re: joining key columns

Post by kanchub79 »

It Seems you got to use same alias names for all the key columns in the user defined SQL in the tables,Then you can see it in the key column drop down list.

LEFT OUTER JOIN:
---------------------
Then got the link ordering tab and try to see that the table which has to be pulled with all the columns remains as the left.

INNER JOIN:
--------------
This has nothing to do with ordering of tables.

Hope this works fine.
Please Correct me if I understood wrongly.
Thanks.
Thanks,
saravan
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Rao, you shouldn't be so hasty in your comments. Please see the osh script I have below, which was compiled on a 7.5 instance of PX. It clearly shows that you can have more than two inputs to a join. dxp's issue is that he doesn't have a common key in all three tables. If he did, he could join all three in one join stage. Since he doesn't, he has to either cascade the joins or use some user-defined SQL to produce one output table that can be joined to the other one.

#################################################################
#### STAGE: Join_3
## Operator
leftouterjoin
## Operator options
-key 'col1'

## General options
[ident('Join_3'); jobmon_ident('Join_3')]
## Inputs
0< 'Row_Generator_9:DSLink4.v'
1< 'Row_Generator_10:DSLink5.v'
2< 'Row_Generator_11:DSLink6.v'
## Outputs
0> [] 'Join_3:DSLink8.v'
;

#################################################################
#### STAGE: Row_Generator_9
## Operator
generator
## Operator options
-schema record
(
col1:string;
col2:string;
)
-records 10

## General options
[ident('Row_Generator_9'); jobmon_ident('Row_Generator_9')]
## Outputs
0> [] 'Row_Generator_9:DSLink4.v'
;

#################################################################
#### STAGE: Row_Generator_10
## Operator
generator
## Operator options
-schema record
(
col1:string;
col3:string;
)
-records 10

## General options
[ident('Row_Generator_10'); jobmon_ident('Row_Generator_10')]
## Outputs
0> [] 'Row_Generator_10:DSLink5.v'
;

#################################################################
#### STAGE: Row_Generator_11
## Operator
generator
## Operator options
-schema record
(
col1:string;
col4:string;
)
-records 10

## General options
[ident('Row_Generator_11'); jobmon_ident('Row_Generator_11')]
## Outputs
0> [] 'Row_Generator_11:DSLink6.v'
;

#################################################################
#### STAGE: Peek_12
## Operator
peek
## Operator options
-nrecs 10
-name

## General options
[ident('Peek_12'); jobmon_ident('Peek_12')]
## Inputs
0< 'Join_3:DSLink8.v'
;
Post Reply