Page 1 of 1

Server jobs vs parallel jobs

Posted: Thu Dec 18, 2008 2:37 am
by chandra4u
One of our client wants to see the performance of server jobs as well as parallel jobs. They want us to run both type of jobs and show them the performance. Could you please help me how can I do that. I would like to use text file as input, do some transformation and then load them into Oracle. Please suggest me what are different stages I can use in server and parallel jobs, so that I can show them that the parallel jobs are faster than the server jobs.

Posted: Thu Dec 18, 2008 5:08 am
by ray.wurlod
Good luck. There are many situations where server jobs are faster (finish sooner) than parallel jobs implementing the same logic.

Parallel jobs will typically win if the volume of data to be processed is huge and can be processed truly in parallel (for example using Data Sets rather than Sequential Files).

Posted: Thu Dec 18, 2008 7:31 am
by chandra4u
Any other suggestions !!!

Posted: Thu Dec 18, 2008 8:02 am
by chulett
Where is your expertise - Server or Parallel? What part of this do you need help with?

Posted: Thu Dec 18, 2008 3:23 pm
by ray.wurlod
chandra4u wrote:Any other suggestions !!!
I find that offensive.

Posted: Sun Dec 21, 2008 12:23 am
by vmcburney
I did some comparison in a post DataStage server v enterprise: some performance stats that showed sort and aggregation was way ahead in parallel. My post DataStage Tip: Extracting database data 250% faster shows how good the parallel Enterprise stages are and I reviewed a recent French benchmark that compare server to parallel in ETL Benchmark Favours DataStage and Talend.

Posted: Sun Dec 21, 2008 7:08 pm
by reachmexyz
chandra4u wrote:Any other suggestions !!!
you can make use of aggregator in Parallel.
Server: pull the records from flat file to aggregator(server) and load to oracle.
Do Parallel with the same stages Aggregator (parallel) and that will run faster. Bewate data should be of huge volume.

Another is try to run two jobs in server in which output of first is fed to input of second job. (in server use sequential file as output and input of 2nd and 1st job whereas in parallel use datasets. ) again the volume of the data should be high.

Whatever you do data volume should be high to show differences between server and parallel.