Page 1 of 1

sparse Lookup v/s Merge

Posted: Mon Nov 17, 2008 8:48 am
by abhinavsuri
Hi
I have a job which is reading from oracle source table and looking up with a another oracle table. the data in the reference table is extremely large as compared to source.
I want to know which option wud be more efficient ,a sparselookup or a merge stage.

Posted: Mon Nov 17, 2008 8:58 am
by nsm
You can use any one. Sparse will be fine if you don't want to go through sorting before you use merge.

nsm.

Posted: Mon Nov 17, 2008 8:59 am
by nsm
You can use any one. Sparse will be fine if you don't want to go through sorting before you use merge.

nsm.

Posted: Mon Nov 17, 2008 4:37 pm
by vmcburney
It depends on how large your source data is. If it has hundreds of thousands of rows then a sparse lookup is going to hit your reference database with hundreds of thousands of queries. A merge stage is going to pull millions of rows out of the Oracle database. A third option is to bulk load the input data into Oracle and do a join between the two tables - this may be the fastest with the smallest impact on the database.

Posted: Tue Nov 18, 2008 3:29 am
by abhinavsuri
actually i want to have a reject file just in case there are no matching records in lookup(normally there should be no rejects )

I tried sparse lookup but it takes very long and gives me undesired number of rows in output.

I am using the merge stage as of now. Do I need to sort the data before using merge? Also, I read that there should be no duplicates for merge to take place correctly. However,my data is such that the column based on which I am merging can have the same value in hundreds of rows. Will this cause a problem?

Posted: Tue Nov 18, 2008 3:04 pm
by ray.wurlod
Merge stage requires de-duplicated data in the "update" input(s), and in the "master" input if there are more than one update inputs. Merge stage requires that all inputs be sorted on the Key column(s). Getting the right results also requires that all inputs be partitioned on the Key column(s).