performance query

dnat · Post by **dnat** » Thu Jun 05, 2008 7:47 am

Hi,

I have a job where i extract about 50 million records from a table.
I join the records with another table which also has around 40 million records on key "A". Later i split these records based on a conditions into 3 links.
Again i have to join these three links (each individually) with a table with Key "B". Since i thought that sort would take more scracth space and the need of repartitioning based on Key B(as initially the records would be partitioned based on Key A), i made the whole job as sequential.

Can anyone suggest me a better way of dealing this. I am worried as the data i am handling is huge.

Thanks!

wesd · Post by **wesd** » Thu Jun 05, 2008 8:19 am

Which DB engine are you using?

ray.wurlod · Post by **ray.wurlod** » Thu Jun 05, 2008 2:35 pm

Unless the records are large, 40~50 million records is not huge.

If you are on SMP hardware, don't worry about re-partitioning - it will be done through shared memory.

dnat · Post by **dnat** » Fri Jun 06, 2008 12:16 am

I am using Oracle DB

Thanks Ray for your response. What about sorting here.Will it not consume more scratch space with the sort stage. This 40-50 million is going to increase to 140-150 million by the next release.
Since i considered 50 million is huge, i am using join stage everywhere..What about using lookup stage here, will it improve the performance.

Thanks!

ray.wurlod · Post by **ray.wurlod** » Sun Jun 08, 2008 11:36 pm

Why not include an ORDER BY clause in the extraction SQL then, in your Sort stage, you can set the sort mode to "don't sort (already sorted)". No scratch space is taken by such a Sort stage.

swapnilverma · Post by **swapnilverma** » Mon Jun 09, 2008 1:02 am

Running in sequential is not a good choice

SQL order by will slower down the job performance.
As DS Sort is much faster.

If you have good amount of space in scratch disk use SORT stage.

if the Key A is indexed in table you can go for SQL join as well.
This will solve ur space issue.

Plz let us know your design deatils as well as the resource details like
no of nodes, space etc.

ray.wurlod · Post by **ray.wurlod** » Mon Jun 09, 2008 2:24 am

swapnilverma wrote:SQL order by will slower down the job performance.
As DS Sort is much faster.

It is very dangerous to make completely general assertions like that. A database sort assisted by a B-tree index will outpace anything DataStage can do, because the index is stored in already-sorted order. All the "sort" has to do is to traverse the leaf nodes of the B-tree.