Explicit Sort Stage Vs TSort Operator ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Explicit Sort Stage Vs TSort Operator ?

Post by kaps »

Hi

I am joining two data sets using join stage and both of them are hash partitioned on the join key but the data sets are not sorted. I believe parallel framework inserts the tsort operator if the data is not sorted.

I see in some of the posts that it's better to put the sort stage explicitly but am not sure about the reason. To me, explicit sort stage or tsort operator both going to sort in the same way. Correct me If I am wrong...

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All three methods use the same tsort operator.

By using an explicit Sort stage you get more control over the amount of memory allocated for sorting, and you can generate Key Change columns if that's important to your processing.

You also get the ability to handle already-sorted data ("don't sort (previously sorted)" for example).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Thanks Ray...So, I don't have to put a sort stage before join stage and sort the key field If I don't have to worry about allocating memory or anything as DatsStage is going to do that. Correct ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't have to, but I include it amongst my "best practices" to do so, especially where the early join keys are already sorted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply