between condition Problem in PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

between condition Problem in PX

Post by sanjay »

Hi All

I have implemented between condition using sparse lookup . It is very expensive it hits the database everytime . Is there any other way to implement between condition in PX . can i use Join stage . if say then how ?

Sanjay
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Most PX comparison stages (eg. lookups, joins, change data capture) work by bringing all the data from the primary stream and all the data from the reference stream onto the ETL server where the comparison action is performed. In the case of a lookup the reference data is loaded into memory for even faster performance.

Sparse lookups are a way around this, however as you have found they are very expensive. I have found sparse lookups to offer considerably worse performance in a parallel job then they would in a server job and we are steering clear of them.

The method we use is to do a lookup without the betweens criteria and the multiple rows returned from link functionality of the lookup stage will expand the output data to cover all possibilities. We then use a filter stage to remove those outside the betweens criteria. Sometimes we preload the reference table into a partitioned dataset so jobs do not have to load it on the fly.

There is a custom stage floating around built by an Ascential consultant that wraps the lookup and filter into a single stage, keep in mind that this stage does essentially the same thing, it still brings all reference rows back to the ETL server and filters out those it does not need. It may not be any faster then a lookup and filter stage combination.

Because parallel jobs try to do everything in memory and in parallel for maximum efficiency there is no easy way to do a range lookup within the RDBMS engine, your best bet is to bring it onto the ETL server or have it preloaded in a sandbox dataset.
gh_amitava
Participant
Posts: 75
Joined: Tue May 13, 2003 4:14 am
Location: California
Contact:

Post by gh_amitava »

Hi,

If your referenced data is high in volume then it is always better to use "Join" stage. :)

Regards
Amitava
Post Reply