Page 1 of 1

Job Effeciency?

Posted: Wed Mar 30, 2005 6:57 pm
by DSguru2B
Hi Gurus,

Is there is a performanc issue if I design all my jobs in one stage ie; i do all my loads from trg, then aggregations, transformationss,etc and load it to src.
Or if I modularize the job and do it in different stages.

To me Modularizing would gives an ease with toubleshooting.

new learner.

Posted: Wed Mar 30, 2005 7:24 pm
by davidnemirovsky
Hi and Welcome aboard!

I think you need to clarify what you are trying to say. Are you suggesting to implement your loading, reference, lookup, staging, mapping and writing phases into one job you might have a maintenance nightmare.

Generally smaller components will be easier to manage, although too many small components (over-engineering) will also be a nightmare so it's a trade off.

Posted: Wed Mar 30, 2005 7:37 pm
by DSguru2B
DJhigh,

You exactly grabbed what I stated. I plan of keeping my jobs not in too many compenents.
So would you call it a bad practise and performance effective if a person is maaking lookups with 12 hash files , transforming it, again looking it up with another big no. of hsh and then aggregation and all. like what exactly are the cons you might have come along.

thanks for the reply.

Posted: Thu Mar 31, 2005 1:51 am
by DSguru2B

Posted: Thu Mar 31, 2005 5:47 pm
by davidnemirovsky
A few questions for clarification:

Are you creating the Hash files in the same job?

Are your two large bunches of lookups and the transform in the same job?

Are these all seperate jobs and you are landing to disk in between the jobs?

How big is your data?

Posted: Thu Mar 31, 2005 6:32 pm
by DSguru2B
Are you creating the Hash files in the same job? Yes

Are your two large bunches of lookups and the transform in the same job? Yes

Are these all seperate jobs and you are landing to disk in between the jobs? NO

How big is your data? Avg sized 3mill rows

Posted: Thu Mar 31, 2005 11:03 pm
by davidnemirovsky
You could try creating the hash files in seperate jobs and potentially firing them off in parallel could be faster.

I think leaving the lookups and the tranforms together in the job should be ok (depending on your row size). Landing to disk is always a last resort.

It seems your data is small enough so that it may be hard for you to gauge performance improvements by modularising your jobs. You could however dummy up some data and see which modularisations give you better performance.