| Author |
Message |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
| DataStage® Release: 8x |
| Job Type: Parallel |
| OS: Unix |
|
Hi folks, there are many range lookup problems posted here, so I realize I am adding to a pile, but I haven't seen this particular issue resolved so I am posting it before submitting a ticket.
I have a Varchar Key and a date on the stream:
StreamKey
StreamDate
I have the same key, and a pair of dates, and a desired data column on the reference:
RefKey
RefDate1
RefDate2
RefData
I want the stream to get the RefData value when StreamKey = RefKey and StreamDate is between RefDate1 and RefDate2.
FYI, RefKey and RefDate1 uniquely define a row in the ref data, and none of the date ranges overlap. So for each StreamKey/StreamDate pair, only one match is possible in the reference. I have the reference link partitioned "entire."
I have tried the following:
1) Match the StreamKey to the RefKey on the reference lookup link, and do a range lookup on StreamDate on the stream link. I got the error.
2) I tried a range lookup on both the StreamKey and the StreamDate on the stream link using the strategy suggested by
vskr72
here http://www.dsxchange.com/viewtopic.php?t=133588
(by duplicating RefKey in the reference link and treating both lookups as ranges on the stream link.)
I got the error. I got the error either way.
Any help is appreciated. At this point, I suppose the most general question is, does range lookup work at all when the range is specified on the stream link? Is there a known bug on Solaris?
Thanks,
Doug
|
|
|
|
|
 |
kris007
 since March 2006
Group memberships: Premium Members
Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446
|
|
|
|
|
|
You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic
| Quote: |
| StreamDate is between RefDate1 and RefDate2 |
That should work fine.
|
_________________ Kris
Where's the "Any" key?-Homer Simpson
|
|
|
|
 |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
|
|
|
|
| kris007 wrote: |
You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic
| Quote: |
| StreamDate is between RefDate1 and RefDate2 |
That should work fine. |
Hi thanks. Joining the StreamKey and the RefKey requires sorting and repartitioning the stream. Also, it would create an explosion in row count. I would like to use a lookup. Is it correct to infer from your response that range lookups on the stream link don't work?
|
|
|
|
|
 |
kris007
 since March 2006
Group memberships: Premium Members
Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446
|
|
|
|
|
|
|
Sorry for misleading you. What I meant was to use LookUp Stage to join just the StreamKey and RefKey and then continue. Range lookups in Parallel jobs are done differently than server jobs and I couldn't figure out a way so far to do the range lookups on the stream link. I don't think it is possible.
|
_________________ Kris
Where's the "Any" key?-Homer Simpson
|
|
|
|
 |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
|
|
|
|
|
Thanks for the clarification. I'm submitting a ticket to see what IBM has to say. In the meantime I think I'll try your idea of using the lookup to match the keys. Up until now, I've always had unique rows in the lookup, so I guess you're saying multiple rows from the reference link just lead to multiple rows on the output like a join would. Speaking of which, something else I tried was to go into the "multiple rows returned from link" area and place a condition in there on the date range. Of course that didn't work either as reference link items are not available in the expression there. Seems like that would have been a nice, intuitive answer. As it is, I can't think of a case where the expression there would be useful. Only being able to apply a constraint to the stream while multiple rows are being returned on the reference makes no sense to me.
|
|
|
|
|
 |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
|
|
|
|
Hi guys, I am trying this idea of doing a lookup on the key, and then using a filter to apply the date logic, but I don't think the lookup is working. If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? My rowcount going in is the same as the rowcount coming out. It should be much larger.
Thanks,
Doug
|
|
|
|
|
 |
kris007
 since March 2006
Group memberships: Premium Members
Joined: 24 Jan 2006
Posts: 1088
Location: Riverside, RI
Points: 7446
|
|
|
|
|
|
| dougcl wrote: |
| If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? Doug |
If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey
| Quote: |
| My rowcount going in is the same as the rowcount coming out. It should be much larger. |
Which link are you talking about here. The reference vs Output or Stream vs Output?
|
_________________ Kris
Where's the "Any" key?-Homer Simpson
|
|
|
|
 |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
|
|
|
|
| kris007 wrote: |
If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey |
Oh duh. I see that. I had the drop down blank. It seems to be working now. thanks,
Doug
|
|
|
|
|
 |
dougcl

Group memberships: Premium Members
Joined: 24 Jun 2010
Posts: 64
Points: 825
|
|
|
|
|
|
Hi folks, I think I have this solved. Apparently the error message is correct. If you are doing a range lookup, the reference link must be sorted on the lookup keys. Partition entire first, then sort.
Doug
|
|
|
|
|
 |
|
|