conceptual difference between partition and repartition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Enzopre
Participant
Posts: 57
Joined: Thu Feb 07, 2013 2:04 pm
Location: Italy

conceptual difference between partition and repartition

Post by Enzopre »

Hi,

I want to know what's the difference between partition and repartition in parallel jobs.

What I know is that partition split data into separate sets and each set is handled by a separate instance of the job stages processed by a separate node.

Repartitioning allow us to repartition data between stages.

But in the above definition of partitioning do we repartition also data between stages? .. or not?
Last edited by Enzopre on Sat Mar 02, 2013 3:25 pm, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Partitioning occurs when the upstream stage is sequential and the downstream stage executes in parallel mode.

Re-partitioning occurs when the upstream stage is executing in parallel mode and the downstream stage is executing in parallel mode, but specifies a different partitioning algorithm from that used in the upstream stage (implicitly or explicitly).

If the downstream stage does not specify a different partitioning algorithm then no re-partitioning occurs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Download the IBM Redbook pdf on InfoSphere DataStage Parallel Framework Standard Practices, lots of great info there and a whole chapter on partitioning.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply