tuning a massive checksum job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abyss
Premium Member
Premium Member
Posts: 172
Joined: Thu May 22, 2014 12:43 am

tuning a massive checksum job

Post by abyss »

Hi all
this topic may be a little more than only datastage

I am trying to tune a job that calculate check sum on one of our table. The table is massive, it has over 120 million rows in it and more than 60 fields.
the job call a checksum stage to calculate checksum value for each record.
the problem is this job takes too long. more than 2 hours, is there a way to tune this job make it run faster?

thanks
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Can you provide some more details about the job?

What are the source and targets?
Are you reading from the database in parallel?
What else is the job doing if anything?
In the checksum what options do you have selected?
How is the job partitioned?
How large is a row in the table?

How have you determined that the checksum is the cause of the perceived poor performance?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Also, the basic question of "Why are you doing a checksum?" needs to be asked. My initial guess is some type of CDC detection...

By describing the need, we might help with a different way to obtain your end goal.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In addition to all of the above, I'd also be curious how many "more than 2" hours are we talking about as that doesn't really sound like all that poor of a performance for a "massive" job of that nature. To me.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply