Hi,
I agree, but my poit is to know the diff between the two method.
In first option rows get blocked for sorting. Then all rows are processed for the respective logic one by one.
In second option rows get blocked for sorting. Then all rows are processed for the respective logic parallely.
Wont that be a single point benifit.
regards
kumar
Effective partition type for sorted input in Transformer
Moderators: chulett, rschirm, roy
Thanks Thompson,thompsonp wrote:Apraman,
I may have misunderstood your question, but are you saying that the logic in your transformer requires the input data to be sorted?
You got my point, and I do not like to use extra sort stage. and I am sorting in the Input Tab of the Transformer.
Once if I have sorted in the Input link
and if I preserve the sorted order at output link of transformer,
Is it necesssary to sorted again while collecting for Sequential file ?
Arun
Thanks Rayray.wurlod wrote:Partitioning is completely irrelevant for output links.
If you do need to sort, it doesn't matter whether you use a Sort stage or sorting as a property of the input link; it will block rows. It must block rows. Think about it.
The Sort stage gives a little more flexibility and control over consumption of memory than is available for input link sorting.
I do need to sort the data.
But you are preferring a sort stage to be included.
Will you please explain the need of the sort stage here?
Arun
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's not a need, it's a preference, because you can allocate more memory to the sorting operation when using a Sort stage. As far as I can recall, memory per partition is limited to 20MB on an input link sort, but is configurable as a property when using the Sort stage. The more that can be done in memory (provided you have enough memory), the faster the sort should finish.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks Rayray.wurlod wrote:It's not a need, it's a preference
If the situation is sequential, all the stages are working in sequential, say I have alloted a single node from the start. Single OS files as source.
Will the inclusion of Sort Stage will matter?
And if I place the Sort stage after the transformer, considering there may/may not be reduction of records at output link of transformer.
Will it be better option?
Arun
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Impossible to say. Presumably there will be no loss of rows through the Transformer stage, so my guess most of the time would be "no difference".
Another possiblity, of course, is to invoke the UNIX sort command in an after-job subroutine, perhaps ExecSH. In rhat case, you don't need any form of sorting within DataStage.
Another possiblity, of course, is to invoke the UNIX sort command in an after-job subroutine, perhaps ExecSH. In rhat case, you don't need any form of sorting within DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
If the transformer retreiving only a particular sets of records based on a constraint. In that case there will be reduction of records.ray.wurlod wrote:Impossible to say. Presumably there will be no loss of rows through the Transformer stage....
Will the placement of SORT stage after transformer be the better option?
Arun
Other than Transformer, in some stages - CHANGE CAPTURE and DIFFERERNCE, you need to take sorted input based on key field.
In those stages if I am sorting at the input links - "before" and "after" -
instead of using SORT stage after the two passive stage
What the issues is going to arise in terms performance wrt to execution time?
In those stages if I am sorting at the input links - "before" and "after" -
instead of using SORT stage after the two passive stage
What the issues is going to arise in terms performance wrt to execution time?
Arun