Transformer Stage Replacement

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mail2krishna79
Charter Member
Charter Member
Posts: 56
Joined: Fri Apr 21, 2006 9:53 am

Transformer Stage Replacement

Post by mail2krishna79 »

Hi,
Somebody designed a job in my project with two transformer Stages are there in the job.
1st transformer they maintained Hash Partitioning and Second Transformer they maintained same partition. Everything else straight move.

Job design is like this :
Ext Src Stg---->Tfm--->Remdup---->OraConnector---->Tfm---->OraConnector

Can anyone suggest me to improve performance in this scenario.

Thanks
hk
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Re: Transformer Stage Replacement

Post by soumya5891 »

If only partition related issues are handled in the transformer then actually no need to implement transformer.This can be handled in any other stage.
Ex-For remove duplicate stage use a sort before it with hash partition and then make the partition in the rdup stage as same.
Soumya
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

FYI: There is typically little need to actually specific Same partitioning. When partitioning is set to Auto, the engine will normally not repartition unless it determines a need to do so.

If all that the transformers are doing is to copy input columns to the output (no null checking, datatype conversion, column generation, etc.) than they are unnecessary and you can do as soumya has suggested.

What type of data volume is being processed and what do the Oracle connectors in the job do, especially the first one?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
mail2krishna79
Charter Member
Charter Member
Posts: 56
Joined: Fri Apr 21, 2006 9:53 am

Post by mail2krishna79 »

Hi Soumaya/jwiless,
Thank you so much for your response. the data volume is 10 million rows.
Sorry for late reply
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

10 million rows through transformers is not an issue in your case, so long as you are doing as little in them as you have stated. That leaves the Oracle Connectors, Remove Duplicates and the External Source stage. So, another series of questions as we try to narrow down.

1) How long does this job run when processing 10 million records?

2) What is External Source calling to provide the data?

3) (as asked before) What do the Oracle Connectors do in this job?

4) What degree of parallelism is the job being run at?

5) How wide are the records? (Average bytes per record)

Potential bottlenecks are the the External Source Stage, the Oracle Connectors and the Remove Duplicates stage.

1) Determine how quickly the External Source stage runs by creating a job with just it dumping to a Copy stage: ExtSrc-->Copy. Compile and run to see how long it takes to provide the 10million records.

2) The Remove Duplicates probably has a Sort inserted in front of it at runtime. Depending on the total size of the data being processed, it may have a minor impact.

3) The Oracle connectors: What does each one do (I suspect the first is an Upset of some sort and the second loads rejects)? Do they load to the same table or separate tables? Do they use custom SQL statements? Are the tables being loaded to Indexed, and are the indexes kept up to date?
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply