How to join data with a BETWEEN function

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dveltman1
Participant
Posts: 12
Joined: Thu Jul 02, 2015 6:12 am

How to join data with a BETWEEN function

Post by dveltman1 »

Hi all,

I would like to left outer join data from a 12million+ rows table onto my stream data, with a DATE from the stream into START_DATE and END_DATE in the 12m+ table.

select A.*, B.*
from A, B
left outer join B on A.key = B.key
where A.DATE >= B.START_DATE
and A.DATE < B.END_DATE

In a Join and Merge stage I cannot add any more constraints.
In a Lookup stage I could not join the data, only lookup.

1. What is the best way to get the 12m+ data? Connector/Lookup_file/...
2. How to join with a point in between 2 dates? Except SQL overwrite.

Thanks!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The LOOKUP stage will do the range lookup. How many bytes per row on the 12Mio reference?
dveltman1
Participant
Posts: 12
Joined: Thu Jul 02, 2015 6:12 am

Post by dveltman1 »

Columns: 24
Bytes Used: 2.199 GB
Bytes Allocated: 2.227 GB
Skew: 0.010000
Row Count: 18,023,892
% Organized: -
Distribution Keys:
Distribution Key Count:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That should be no problem at all for a lookup. Are you using all 24 columns as keys or data in the lookup? If not, remove any unused columns in the SELECT to reduce your data volume. Sort the data in the SELECT and use the range lookup options in the lookup stage.
dveltman1
Participant
Posts: 12
Joined: Thu Jul 02, 2015 6:12 am

Post by dveltman1 »

I was trying to get the 12mio rows with a Lookup_file. But I can't use it for a range lookup when I'm joining with a Connector stage.
I cannot get the right input link on stream so I can define the range.
Any idea's?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I have to admit I don't understand quite what you are saying.

Use a LOOKUP stage in order to specify range lookups.
dveltman1
Participant
Posts: 12
Joined: Thu Jul 02, 2015 6:12 am

Post by dveltman1 »

Source 1 is connector stage and stream.
Source 2 is Lookup_File and reference.

The stream needs to do a range lookup to the Lookup_File, with the lookup stage.

But I can't perform the range lookup with that combination.
It does work with 2 connector stages, but I would like to use the Lookup_file because I would like to load it once and re-use it in other jobs.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Do you mean a database connector stage? And a lookup fileset stage for the lookup? If not, could you post what the data source and exact stage type for the reference link to the lookup stage is?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

dveltman1 wrote:But I can't perform the range lookup with that combination.
You should be able to; the range lookup is performed against a virtual data set loaded into memory no matter what the supporting stage type is (except, possibly, Lookup File Set stage).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply