performance query

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

performance query

Post by dnat »

Hi,

I have a job where i extract about 50 million records from a table.
I join the records with another table which also has around 40 million records on key "A". Later i split these records based on a conditions into 3 links.
Again i have to join these three links (each individually) with a table with Key "B". Since i thought that sort would take more scracth space and the need of repartitioning based on Key B(as initially the records would be partitioned based on Key A), i made the whole job as sequential.


Can anyone suggest me a better way of dealing this. I am worried as the data i am handling is huge.

Thanks!
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

Post by wesd »

Which DB engine are you using?
Wes Dumey
Senior Consultant
Data Warehouse Projects
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Unless the records are large, 40~50 million records is not huge.

If you are on SMP hardware, don't worry about re-partitioning - it will be done through shared memory.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

I am using Oracle DB

Thanks Ray for your response. What about sorting here.Will it not consume more scratch space with the sort stage. This 40-50 million is going to increase to 140-150 million by the next release.
Since i considered 50 million is huge, i am using join stage everywhere..What about using lookup stage here, will it improve the performance.

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not include an ORDER BY clause in the extraction SQL then, in your Sort stage, you can set the sort mode to "don't sort (already sorted)". No scratch space is taken by such a Sort stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

Running in sequential is not a good choice

SQL order by will slower down the job performance.
As DS Sort is much faster.

If you have good amount of space in scratch disk use SORT stage.

if the Key A is indexed in table you can go for SQL join as well.
This will solve ur space issue.


Plz let us know your design deatils as well as the resource details like
no of nodes, space etc.

:arrow: :arrow: :arrow:
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

swapnilverma wrote:SQL order by will slower down the job performance.
As DS Sort is much faster.
It is very dangerous to make completely general assertions like that. A database sort assisted by a B-tree index will outpace anything DataStage can do, because the index is stored in already-sorted order. All the "sort" has to do is to traverse the leaf nodes of the B-tree.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply