Effective partition type for sorted input in Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
I agree, but my poit is to know the diff between the two method.

In first option rows get blocked for sorting. Then all rows are processed for the respective logic one by one.

In second option rows get blocked for sorting. Then all rows are processed for the respective logic parallely.

Wont that be a single point benifit.

regards
kumar
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

thompsonp wrote:Apraman,

I may have misunderstood your question, but are you saying that the logic in your transformer requires the input data to be sorted?
Thanks Thompson,

You got my point, and I do not like to use extra sort stage. and I am sorting in the Input Tab of the Transformer.

Once if I have sorted in the Input link
and if I preserve the sorted order at output link of transformer,
Is it necesssary to sorted again while collecting for Sequential file ?
Arun
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

ray.wurlod wrote:Partitioning is completely irrelevant for output links.

If you do need to sort, it doesn't matter whether you use a Sort stage or sorting as a property of the input link; it will block rows. It must block rows. Think about it.

The Sort stage gives a little more flexibility and control over consumption of memory than is available for input link sorting.
Thanks Ray

I do need to sort the data.
But you are preferring a sort stage to be included.

Will you please explain the need of the sort stage here?
Arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's not a need, it's a preference, because you can allocate more memory to the sorting operation when using a Sort stage. As far as I can recall, memory per partition is limited to 20MB on an input link sort, but is configurable as a property when using the Sort stage. The more that can be done in memory (provided you have enough memory), the faster the sort should finish.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

ray.wurlod wrote:It's not a need, it's a preference
Thanks Ray

If the situation is sequential, all the stages are working in sequential, say I have alloted a single node from the start. Single OS files as source.

Will the inclusion of Sort Stage will matter?
And if I place the Sort stage after the transformer, considering there may/may not be reduction of records at output link of transformer.
Will it be better option?
Arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Impossible to say. Presumably there will be no loss of rows through the Transformer stage, so my guess most of the time would be "no difference".

Another possiblity, of course, is to invoke the UNIX sort command in an after-job subroutine, perhaps ExecSH. In rhat case, you don't need any form of sorting within DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

ray.wurlod wrote:Impossible to say. Presumably there will be no loss of rows through the Transformer stage....
If the transformer retreiving only a particular sets of records based on a constraint. In that case there will be reduction of records.

Will the placement of SORT stage after transformer be the better option?
Arun
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

Other than Transformer, in some stages - CHANGE CAPTURE and DIFFERERNCE, you need to take sorted input based on key field.

In those stages if I am sorting at the input links - "before" and "after" -
instead of using SORT stage after the two passive stage
What the issues is going to arise in terms performance wrt to execution time?
Arun
Post Reply