Job Effeciency?

DSguru2B · Post by **DSguru2B** » Wed Mar 30, 2005 6:57 pm

Hi Gurus,

Is there is a performanc issue if I design all my jobs in one stage ie; i do all my loads from trg, then aggregations, transformationss,etc and load it to src.
Or if I modularize the job and do it in different stages.

To me Modularizing would gives an ease with toubleshooting.

new learner.

davidnemirovsky · Post by **davidnemirovsky** » Wed Mar 30, 2005 7:24 pm

Hi and Welcome aboard!

I think you need to clarify what you are trying to say. Are you suggesting to implement your loading, reference, lookup, staging, mapping and writing phases into one job you might have a maintenance nightmare.

Generally smaller components will be easier to manage, although too many small components (over-engineering) will also be a nightmare so it's a trade off.

DSguru2B · Post by **DSguru2B** » Wed Mar 30, 2005 7:37 pm

DJhigh,

You exactly grabbed what I stated. I plan of keeping my jobs not in too many compenents.
So would you call it a bad practise and performance effective if a person is maaking lookups with 12 hash files , transforming it, again looking it up with another big no. of hsh and then aggregation and all. like what exactly are the cons you might have come along.

thanks for the reply.

DSguru2B · Post by **DSguru2B** » Thu Mar 31, 2005 1:51 am

davidnemirovsky · Post by **davidnemirovsky** » Thu Mar 31, 2005 5:47 pm

A few questions for clarification:

Are you creating the Hash files in the same job?

Are your two large bunches of lookups and the transform in the same job?

Are these all seperate jobs and you are landing to disk in between the jobs?

How big is your data?

DSguru2B · Post by **DSguru2B** » Thu Mar 31, 2005 6:32 pm

Are you creating the Hash files in the same job? Yes

Are your two large bunches of lookups and the transform in the same job? Yes

Are these all seperate jobs and you are landing to disk in between the jobs? NO

How big is your data? Avg sized 3mill rows

davidnemirovsky · Post by **davidnemirovsky** » Thu Mar 31, 2005 11:03 pm

You could try creating the hash files in seperate jobs and potentially firing them off in parallel could be faster.

I think leaving the lookups and the tranforms together in the job should be ok (depending on your row size). Landing to disk is always a last resort.

It seems your data is small enough so that it may be hard for you to gauge performance improvements by modularising your jobs. You could however dummy up some data and see which modularisations give you better performance.