DataStage Sort Stage vs Inline Link Sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

DataStage Sort Stage vs Inline Link Sort

Post by rwierdsm »

Folks,

The sort Stage in 8.X works much better than it used to in 7.X. We did some benchmarks which indicated that for our test file / environment the sort stage was about twice as fast, presumably because the sort stage can allocate more memory to the sort.

Is everyone seeing this kind of performance difference? Are there drawbacks to using the sort stage over the inline sort?

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

They're the same tsort operator under the covers. The stage just makes it more 'visible' in the job and gives you control over the parameters the sort can use.
Last edited by chulett on Thu Dec 05, 2013 8:12 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Thanks for your response Craig.

Does the sort stage default to a higher amount of allocated memory? We saw significantly better performance in the stage. Is there some risk in using too much memory when lots of sort stages are invoked at the same time?

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure about the default... others will have to answer that. As to the risk, sure, there's always that kind of resource issue risk when doing lots of anything. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The stage defaults to the same amount of memory as the inlink sort. The difference is that you can change it in the stage.

The global memory for tsort operators is set by environment variable APT_TSORT_STRESS_BLOCKSIZE

Yes, you can demand more memory than the system can provide. You can even do this at the default setting. The symptom is a lot of temporary files with "sort" as part of their name in the scratchdisk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

From the IBM doco for 8.5

===========
Restrict memory usage
This is set to 20 by default. It causes the Sort stage to restrict itself to the specified number of megabytes of virtual memory on a processing node.

The number of megabytes specified should be smaller than the amount of physical memory on a processing node. For Windows systems, the value for Restrict Memory Usage should not exceed 500.
==============

This number can be modified on the Properties Tab. I was not able to find a indication of how much memory is used by the inline sort, however, based on our benchmarks, it would be considerably less.

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Post Reply