RCP job Vs Normal Job (Unix Server performance)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_FocusGroup
Premium Member
Premium Member
Posts: 197
Joined: Sun Jul 15, 2007 11:45 pm
Location: Prague

RCP job Vs Normal Job (Unix Server performance)

Post by DS_FocusGroup »

Hi,

Can anyone logically explain the difference on the load a Unix Machine has (in terms of CPU , memory and I/O) between a job that has RCP enabled and versus a job that has columns defined(Which one of them will consume more resources).If someone can give a logical analysis with some concrete analogies and examples then it will be great.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'd be surprised if there's any appreciable difference but that's just gut feeling based on nothing but what I had for breakfast. Be curious what the answer turns out to be...
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Are you asking about comparing, for example, an RCP disabled job with 20 columns defined vs. an RCP enabled job with the same 20 columns passed through?
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I assumed so... basically an "apples to apples" comparison.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DS_FocusGroup
Premium Member
Premium Member
Posts: 197
Joined: Sun Jul 15, 2007 11:45 pm
Location: Prague

Post by DS_FocusGroup »

qt_ky wrote:Are you asking about comparing, for example, an RCP disabled job with 20 columns defined vs. an RCP enabled job with the same 20 columns passed through?
Yes for a start.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I ran a test job just for kicks, so I can learn something too. Dataset w/ 10M rows -> Copy -> DB2 Connector (insert/replace). It's constrained to 2 nodes. One job has 13 columns defined and RCP disabled and the other job no columns defined and RCP enabled. I ran each job 3 times and on average each run took 2.5 minutes with the non-RCP job averaging 10 seconds faster. It's not a great test because of the short run times.

I can say the OSH from the RCP job is a lot easier on the eyes. I don't know internally what the difference is/how RCP works at run time, and will leave it to anyone else who wants to explain the internals or resource differences.

My guess is that the intention of the RCP feature is to save developer hours and not so much regarding resource or performance differences. I would want to see you run a test with a much larger data set to be sure, but until then I would have to agree that the performance difference is insignificant.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply