Hi
I am joining two data sets using join stage and both of them are hash partitioned on the join key but the data sets are not sorted. I believe parallel framework inserts the tsort operator if the data is not sorted.
I see in some of the posts that it's better to put the sort stage explicitly but am not sure about the reason. To me, explicit sort stage or tsort operator both going to sort in the same way. Correct me If I am wrong...
Thanks
Explicit Sort Stage Vs TSort Operator ?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
All three methods use the same tsort operator.
By using an explicit Sort stage you get more control over the amount of memory allocated for sorting, and you can generate Key Change columns if that's important to your processing.
You also get the ability to handle already-sorted data ("don't sort (previously sorted)" for example).
By using an explicit Sort stage you get more control over the amount of memory allocated for sorting, and you can generate Key Change columns if that's important to your processing.
You also get the ability to handle already-sorted data ("don't sort (previously sorted)" for example).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: