sparse Lookup v/s Merge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

sparse Lookup v/s Merge

Post by abhinavsuri »

Hi
I have a job which is reading from oracle source table and looking up with a another oracle table. the data in the reference table is extremely large as compared to source.
I want to know which option wud be more efficient ,a sparselookup or a merge stage.
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

You can use any one. Sparse will be fine if you don't want to go through sorting before you use merge.

nsm.
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

You can use any one. Sparse will be fine if you don't want to go through sorting before you use merge.

nsm.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It depends on how large your source data is. If it has hundreds of thousands of rows then a sparse lookup is going to hit your reference database with hundreds of thousands of queries. A merge stage is going to pull millions of rows out of the Oracle database. A third option is to bulk load the input data into Oracle and do a join between the two tables - this may be the fastest with the smallest impact on the database.
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

Post by abhinavsuri »

actually i want to have a reject file just in case there are no matching records in lookup(normally there should be no rejects )

I tried sparse lookup but it takes very long and gives me undesired number of rows in output.

I am using the merge stage as of now. Do I need to sort the data before using merge? Also, I read that there should be no duplicates for merge to take place correctly. However,my data is such that the column based on which I am merging can have the same value in hundreds of rows. Will this cause a problem?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Merge stage requires de-duplicated data in the "update" input(s), and in the "master" input if there are more than one update inputs. Merge stage requires that all inputs be sorted on the Key column(s). Getting the right results also requires that all inputs be partitioned on the Key column(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply