Weired Join Stage problem.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

xinhuang66
Participant
Posts: 161
Joined: Wed Aug 02, 2006 4:30 am

Weired Join Stage problem.

Post by xinhuang66 »

I found a very weired things in Join stage, I couldn't understand why DS works like that .

Sequencial file stage 1
Varchar(3)
001
002
003

Sequencial file stage 2
varchar(5), varchar(5)
0010, apple
001, pear

Inner join 1 and 2 will get
001, apple
001, pear

However, if I increase sequencial file1 from varchar(3) to varchar(5), DS will give me right answer..

Anybody can help me explain why this can happen ? or I need to set some environment variables to get rid of such things.

thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

My guess is that it takes the minimum definition of the 2 varchar inputs and uses that for the join. If you change your 2nd sequential file first column to varchar(3) your join is correct.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

As I understand it, datatype matters in a join, as does data format. For example, '003 ' is different from '003'.

I know this is the case in a filter. If the field is 5 characters and has a 'A', then the filter clause needs to be 'A '. I would assume the same rule applies in a join stage.

... course that may be specific to 'char' and not varchar. We don't use many varchars, especially for fields that are keys or often used in where clauses, so to be honest we have rarely run into this.

Maybe the bigger issue is matching on datatype...

Brad.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Join stage will compare the field names as well as the meta data. Since the lengths are different (3 and 5) the field with length 5 is truncated to 3. hence the output. Out of curiosity what happens with RCP off in this case with different lengths? Can u try that out. should not make a difference, but can u post the result here.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

ArndW wrote: If you change your 2nd sequential file first column to varchar(3) your join is correct. ...
But the result would be the same and possibly a warning telling that the data is truncated in the source stage. wouldn't it?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

But if "0010" gets truncated to 3 character it would be "001", hence the initial result.
xinhuang66
Participant
Posts: 161
Joined: Wed Aug 02, 2006 4:30 am

Post by xinhuang66 »

We didn't Enable RCP.

How come DS will automatically truncate the column length to the smaller one ? it will generate heaps of wrong results ...

I just can't believe it....
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

The simple and sure shot work around is increase the length from 3 to 5 in a copy or transformer stage and it will work. I have had this problem and this is working fine.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Rubu
Premium Member
Premium Member
Posts: 82
Joined: Sun Feb 27, 2005 9:09 pm
Location: Bangalore

Post by Rubu »

Right!

I agree to Josh. Even I could not really recreate the problem. Whereas I got the following warning which may put some light in understanding why its happening.

When checking operator: When binding output schema variable "leftRec": When binding output interface field "id1" to field "Id1": Implicit conversion from source type "string[max=5]" to result type "string[max=3]": Possible truncation of variable length string.

I feel that the VARCHAR(5) column is getting truncated to VarChar(3). And hence the wierd result.

having said that I accept the fact that I have failed to generate the wrong output :)
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

If you change the link odering then below error also will disappear.
This is because your "leftRec" of Inner join is with VarChar(3) and you have mapped that to the output result column, which is VarChar(5).
Rubu wrote: Whereas I got the following warning which may put some light in understanding why its happening.

When checking operator: When binding output schema variable "leftRec": When binding output interface field "id1" to field "Id1": Implicit conversion from source type "string[max=5]" to result type "string[max=3]": Possible truncation of variable length string.

I feel that the VARCHAR(5) column is getting truncated to VarChar(3). And hence the wierd result.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
xinhuang66
Participant
Posts: 161
Joined: Wed Aug 02, 2006 4:30 am

Post by xinhuang66 »

JoshGeorge wrote:Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.
Hi Josh,

I guess that is possible because of the input link sequences, Can you just swap your input link to see what will happen?
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

You are the OP , pls. test and see for yourself. We have already posted our findings to help you on this regard.
xinhuang66 wrote:
JoshGeorge wrote:Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.
Hi Josh,

I guess that is possible because of the input link sequences, Can you just swap your input link to see what will happen?
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
harshada
Premium Member
Premium Member
Posts: 92
Joined: Tue May 29, 2007 8:40 am

Post by harshada »

We had a similar kind of problem , but our join keys were Char(8) and Varchar(8) . THe result as has laready been mentioned is weird.
So we made both the join keys data type same and have followed the same principle in every job that has a JOIN Stage , to keep join keys data type same ( even lenght)
harshada
Premium Member
Premium Member
Posts: 92
Joined: Tue May 29, 2007 8:40 am

Post by harshada »

sorry about the previous post

'' We had a similar kind of problem , but our join keys were Char8 and Varchar8 . THe result as has already been mentioned is weird. "
Post Reply