Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

xinhuang:

1) I've personally not heard of this issue from support, but I'll defer to them and engineering. It sounds as if it's something specific to your situation, as I have not encountered it before.

2-4) The requirement for key partitioning DOES NOT mean that the developer has to explicitly specify a key partitioner instead of Auto partitioning in their job design. In case you missed it at the very top of my previous post, I explained what the Auto Partition option does in simple terms.

2) Join and RemDup require that data has been key partitioned in order to work as designed (that is, to produce the results they are designed to produce). If data has not been properly key partitioned, you can miss matches in Join or duplicates in RemDup because rows that should match or are duplicates can end up in different partitions.
Aggregator is a little different, but key partitioning is the best choice 99% of the time whether you choose it or Auto partitioning chooses it for you. For the other 1%--only if you are an advanced DataStage developer and have a clear understanding of how Aggregator is working would you need to choose something else.

3) No one ever said EXPLICIT key partitioning was mandatory...if they did, they are mistaken. I think this is where you are having an issue: You seem to believe that someone has said the developers must EXPLICITLY choose a key partitioning option instead of Auto. That is not the case.

Regarding Chandra's jobs: I did mention in my reply to Chandra that Auto partition was likely choosing Hash partitioning (which is a key partitioner) for his jobs. Did you miss this? Because Auto is choosing a key partitioner for him, his jobs are working without problems. Also, it was not stated that choosing Auto partitioning would automatically cause problems, just that it MAY not always result in the most efficiently-running jobs. In Chandra's case, as in most situations, Auto partitioning works great and jobs are apparently meeting their performance expectations.

4) No, it's not wrong. Why would it be?

I don't know your actual experience level with DataStage. My impression is that you don't have a clear enough understanding of how partition parallelism works in DataStage for all of this discussion to make sense to you. It can be a difficult topic to understand, especially when trying to piece it together from forum posts.

This topic in the Information Server documentation, Parallel Processing in Information Server, may be of help to you. I also recommend that you read this RedBook, especially Chapter 6. The Parallel Job Developer's Guide, available at this link also describes partition parallelism, which is at the heart of all of this discussion.

Also, consider enrolling in the Advanced DataStage training class if you've not already taken it. Perhaps your employer will pay for you to take it.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@James,
Anything regarding my previous comment ?
Do I need to take the training too ? :roll:
Thanx and Regards,
ETL User
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Chandra, I didn't see anything that necessitated a comment or correction :) What you stated is correct.

The Advanced DataStage class is beneficial as it is more in-depth than the DataStage Essentials class in how the engine works. Do you need to take it? I don't know that you need to, but I consider it to be helpful for anyone who would like to continue to improve their DataStage knowledge and skill level. I think it would be helpful for xinhuang, which is why I suggested it (but didn't intend to suggest that it is needed) along with the other resources I listed.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jpraveen
Participant
Posts: 71
Joined: Sat Jun 06, 2009 7:10 am
Location: HYD

Partitioning of the Stage

Post by jpraveen »

Hi Guys,

It will be helpful if anyone provide partitioning types for all stages like

Stage ---------- Partition ---------- Alternate Partition
-----------------------------------------------------------------
RemDup ------- Hash ----------- Auto

Join ---------- Hash ---------- Auto

ans so on....

You may suggest to read the IBM Docs, or search the forum

but my thought is by seeing this thread every one should be blindly follow the Partitioning Method for the particular stage..

Thanks in Advance.
Jaypee
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm not sure how helpful it would be to bury information like this in a long post which will scroll off the first page and into possible oblivion once people stop adding to it. Now, if someone wants to put something together on the subject for a FAQ post, that would be helpful.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply