DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Wed Jul 28, 2010 2:52 pm Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
Hi folks, there are many range lookup problems posted here, so I realize I am adding to a pile, but I haven't seen this particular issue resolved so I am posting it before submitting a ticket.

I have a Varchar Key and a date on the stream:

StreamKey
StreamDate

I have the same key, and a pair of dates, and a desired data column on the reference:
RefKey
RefDate1
RefDate2
RefData

I want the stream to get the RefData value when StreamKey = RefKey and StreamDate is between RefDate1 and RefDate2.

FYI, RefKey and RefDate1 uniquely define a row in the ref data, and none of the date ranges overlap. So for each StreamKey/StreamDate pair, only one match is possible in the reference. I have the reference link partitioned "entire."

I have tried the following:
1) Match the StreamKey to the RefKey on the reference lookup link, and do a range lookup on StreamDate on the stream link. I got the error.

2) I tried a range lookup on both the StreamKey and the StreamDate on the stream link using the strategy suggested by vskr72 here http://www.dsxchange.com/viewtopic.php?t=133588
(by duplicating RefKey in the reference link and treating both lookups as ranges on the stream link.)

I got the error. I got the error either way.

Any help is appreciated. At this point, I suppose the most general question is, does range lookup work at all when the range is specified on the stream link? Is there a known bug on Solaris?

Thanks,
Doug
kris007


since March 2006

Group memberships:
Premium Members

Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446

Post Posted: Wed Jul 28, 2010 3:26 pm Reply with quote    Back to top    

You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic
Quote:
StreamDate is between RefDate1 and RefDate2


That should work fine.

_________________
Kris

Where's the "Any" key?-Homer Simpson
Rate this response:  
Not yet rated
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Wed Jul 28, 2010 4:24 pm Reply with quote    Back to top    

kris007 wrote:
You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic
Quote:
StreamDate is between RefDate1 and RefDate2


That should work fine.


Hi thanks. Joining the StreamKey and the RefKey requires sorting and repartitioning the stream. Also, it would create an explosion in row count. I would like to use a lookup. Is it correct to infer from your response that range lookups on the stream link don't work?
Rate this response:  
Not yet rated
kris007


since March 2006

Group memberships:
Premium Members

Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446

Post Posted: Wed Jul 28, 2010 5:44 pm Reply with quote    Back to top    

Sorry for misleading you. What I meant was to use LookUp Stage to join just the StreamKey and RefKey and then continue. Range lookups in Parallel jobs are done differently than server jobs and I couldn't figure out a way so far to do the range lookups on the stream link. I don't think it is possible.

_________________
Kris

Where's the "Any" key?-Homer Simpson
Rate this response:  
Not yet rated
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Wed Jul 28, 2010 11:23 pm Reply with quote    Back to top    

Thanks for the clarification. I'm submitting a ticket to see what IBM has to say. In the meantime I think I'll try your idea of using the lookup to match the keys. Up until now, I've always had unique rows in the lookup, so I guess you're saying multiple rows from the reference link just lead to multiple rows on the output like a join would. Speaking of which, something else I tried was to go into the "multiple rows returned from link" area and place a condition in there on the date range. Of course that didn't work either as reference link items are not available in the expression there. Seems like that would have been a nice, intuitive answer. As it is, I can't think of a case where the expression there would be useful. Only being able to apply a constraint to the stream while multiple rows are being returned on the reference makes no sense to me.
Rate this response:  
Not yet rated
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Fri Jul 30, 2010 12:07 pm Reply with quote    Back to top    

Hi guys, I am trying this idea of doing a lookup on the key, and then using a filter to apply the date logic, but I don't think the lookup is working. If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? My rowcount going in is the same as the rowcount coming out. It should be much larger.

Thanks,
Doug
Rate this response:  
Not yet rated
kris007


since March 2006

Group memberships:
Premium Members

Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446

Post Posted: Fri Jul 30, 2010 12:25 pm Reply with quote    Back to top    

dougcl wrote:
If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? Doug


If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey

Quote:
My rowcount going in is the same as the rowcount coming out. It should be much larger.

Which link are you talking about here. The reference vs Output or Stream vs Output?

_________________
Kris

Where's the "Any" key?-Homer Simpson
Rate this response:  
Not yet rated
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Fri Jul 30, 2010 2:14 pm Reply with quote    Back to top    

kris007 wrote:

If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey


Oh duh. I see that. I had the drop down blank. It seems to be working now. thanks,
Doug
Rate this response:  
Not yet rated
dougcl



Group memberships:
Premium Members

Joined: 24 Jun 2010
Posts: 64

Points: 825

Post Posted: Thu Aug 05, 2010 1:42 pm Reply with quote    Back to top    

Hi folks, I think I have this solved. Apparently the error message is correct. If you are doing a range lookup, the reference link must be sorted on the lookup keys. Partition entire first, then sort.

Doug
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours