Better Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
georgesebastian
Participant
Posts: 83
Joined: Tue Dec 19, 2006 8:38 am

Better Performance

Post by georgesebastian »

Hi Everyone,
I have a design issue.
In the source DRS I need to get data from 28 tables.The structure of the tables are same but having very huge amount of data. (all the rows are distinct)

I would kindly like to know which one will give me a better performance
My options are

1.Using a UNION in the source DRS
2.Using a UNION ALL in the source DRS
3.Using 28 sepreate DRS in the source and using LINK COLLECTOR to combine the data.

I thought it would be the UNION ALL since it merges data from the Database itself but
I read in one of the DSxchange post that its LINK COLLECTOR which will give a better performance than UNION ALL

viewtopic.php?t=113306&highlight=Link+Collector
(please refer to the post from balasumit21 )

Please advise

Thanks
George Sebastian
:wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "performance" in an ETL context. Only then might one be able to address whether whatever it is can be made "better" (whatever that is).

Please do not cite rows/sec as your performance metric. Search DSXchange for many rants from me about why it is meaningless.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

union all is more performant than UNION because a UNION is a distinct.

you have a lot of screws for to find the optimal performance, there are many ways and there is not best of all for each situation. To find the best way for your case you need some investigation. Sometimes the bottleneck for performance is not the way of reading but the way of writing and its syncronisation.
Wolfgang Hürter
Amsterdam
georgesebastian
Participant
Posts: 83
Joined: Tue Dec 19, 2006 8:38 am

Post by georgesebastian »

Hi all,

What about using 28 DRS and after that the using a Sequencial file(with append option) insted of using Link collector.
which one will be a better design

Please advise

Thanks
George
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

WoMaWil wrote:union all is more performant than UNION because a UNION is a distinct.
Specifically, a UNION ALL simply returns the results of all queries. A UNION must first get the results of all queries and then so a 'SORT DISTINCT' on the results, so it obviously will take longer and more processing resources.

You make the choice based on whether you need duplicates removed from the result set, not on which one is 'faster'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply