Page 1 of 1

tuning a massive checksum job

Posted: Thu Nov 16, 2017 6:01 pm
by abyss
Hi all
this topic may be a little more than only datastage

I am trying to tune a job that calculate check sum on one of our table. The table is massive, it has over 120 million rows in it and more than 60 fields.
the job call a checksum stage to calculate checksum value for each record.
the problem is this job takes too long. more than 2 hours, is there a way to tune this job make it run faster?

thanks

Posted: Fri Nov 17, 2017 3:35 am
by thompsonp
Can you provide some more details about the job?

What are the source and targets?
Are you reading from the database in parallel?
What else is the job doing if anything?
In the checksum what options do you have selected?
How is the job partitioned?
How large is a row in the table?

How have you determined that the checksum is the cause of the perceived poor performance?

Posted: Fri Nov 17, 2017 8:41 am
by PaulVL
Also, the basic question of "Why are you doing a checksum?" needs to be asked. My initial guess is some type of CDC detection...

By describing the need, we might help with a different way to obtain your end goal.

Posted: Fri Nov 17, 2017 9:05 am
by chulett
In addition to all of the above, I'd also be curious how many "more than 2" hours are we talking about as that doesn't really sound like all that poor of a performance for a "massive" job of that nature. To me.