Partitioning

jwiles · Post by **jwiles** » Wed Jul 03, 2013 12:08 am

xinhuang:

1) I've personally not heard of this issue from support, but I'll defer to them and engineering. It sounds as if it's something specific to your situation, as I have not encountered it before.

2-4) The requirement for key partitioning DOES NOT mean that the developer has to explicitly specify a key partitioner instead of Auto partitioning in their job design. In case you missed it at the very top of my previous post, I explained what the Auto Partition option does in simple terms.

2) Join and RemDup require that data has been key partitioned in order to work as designed (that is, to produce the results they are designed to produce). If data has not been properly key partitioned, you can miss matches in Join or duplicates in RemDup because rows that should match or are duplicates can end up in different partitions.
Aggregator is a little different, but key partitioning is the best choice 99% of the time whether you choose it or Auto partitioning chooses it for you. For the other 1%--only if you are an advanced DataStage developer and have a clear understanding of how Aggregator is working would you need to choose something else.

3) No one ever said EXPLICIT key partitioning was mandatory...if they did, they are mistaken. I think this is where you are having an issue: You seem to believe that someone has said the developers must EXPLICITLY choose a key partitioning option instead of Auto. That is not the case.

Regarding Chandra's jobs: I did mention in my reply to Chandra that Auto partition was likely choosing Hash partitioning (which is a key partitioner) for his jobs. Did you miss this? Because Auto is choosing a key partitioner for him, his jobs are working without problems. Also, it was not stated that choosing Auto partitioning would automatically cause problems, just that it MAY not always result in the most efficiently-running jobs. In Chandra's case, as in most situations, Auto partitioning works great and jobs are apparently meeting their performance expectations.

4) No, it's not wrong. Why would it be?

I don't know your actual experience level with DataStage. My impression is that you don't have a clear enough understanding of how partition parallelism works in DataStage for all of this discussion to make sense to you. It can be a difficult topic to understand, especially when trying to piece it together from forum posts.

This topic in the Information Server documentation, Parallel Processing in Information Server, may be of help to you. I also recommend that you read this RedBook, especially Chapter 6. The Parallel Job Developer's Guide, available at this link also describes partition parallelism, which is at the heart of all of this discussion.

Also, consider enrolling in the Advanced DataStage training class if you've not already taken it. Perhaps your employer will pay for you to take it.

Regards,

chandra.shekhar@tcs.com · Wed Jul 03, 2013 12:30 am

@James,
Anything regarding my previous comment ?
Do I need to take the training too ?

jwiles · Post by **jwiles** » Wed Jul 03, 2013 8:32 am

Chandra, I didn't see anything that necessitated a comment or correction

What you stated is correct.

The Advanced DataStage class is beneficial as it is more in-depth than the DataStage Essentials class in how the engine works. Do you need to take it? I don't know that you need to, but I consider it to be helpful for anyone who would like to continue to improve their DataStage knowledge and skill level. I think it would be helpful for xinhuang, which is why I suggested it (but didn't intend to suggest that it is needed) along with the other resources I listed.

Regards,

jpraveen · Post by **jpraveen** » Thu Jul 04, 2013 12:51 am

Hi Guys,

It will be helpful if anyone provide partitioning types for all stages like

Stage ---------- Partition ---------- Alternate Partition
-----------------------------------------------------------------
RemDup ------- Hash ----------- Auto

Join ---------- Hash ---------- Auto

ans so on....

You may suggest to read the IBM Docs, or search the forum

but my thought is by seeing this thread every one should be blindly follow the Partitioning Method for the particular stage..

Thanks in Advance.

chulett · Post by **chulett** » Thu Jul 04, 2013 7:34 am

I'm not sure how helpful it would be to bury information like this in a long post which will scroll off the first page and into possible oblivion once people stop adding to it. Now, if someone wants to put something together on the subject for a FAQ post, that would be helpful.

DSXchange

Partitioning

Partitioning of the Stage