TOAD vs DataStage

arpitchopra · Post by **arpitchopra** » Thu Dec 02, 2010 5:18 am

Reduced it and checked again. Array size is 2000 now. Kept Rows per Transaction as 0 and Prefetch memory to 1024.
The job takes 1 hr to run now.

chulett · Post by **chulett** » Thu Dec 02, 2010 8:31 am

'Sweet spot' means you have to try various values, dialing it up and back down again until you find the appropriate value for this combination. Many things can affect this with the setting you've mentioned only part of it, for example your average record length, indexes on the target, disk subsystem write speed, database settings, how often you commit, network traffic, etc etc etc.

DataStage isn't anything magical and is just another Oracle client. It leverages the standard OCI interface and TCPIP transport layer and tuning loads for a job isn't all that different from going through the same exercise outside of the tool with the exception of understanding how the DataStage engine plays a role here and the movement path of the data (possibly) across your network. For large data volumes with minimal transformations with a single instance, sometimes it is better fall back on native tools like sqlplus / sqlldr / plsql rather than a tool that is meant to be able to extract and load heterogeneous data from anywhere to anywhere.

Abhijeet1980 · Post by **Abhijeet1980** » Thu Dec 30, 2010 9:51 am

Arpit,

Pls get in touch with your DBA, he would certainly be in good position to help you.

He may also help you know the differences in the behaviors of your queries (Toad and DataStage).

Lastly, try posting your query here.

Kind regards,
Abhijit Gaikwad

Post by **daignault** » Thu Dec 30, 2010 10:08 am

Hi All,

Why not create custom SQL within the Oracle stage? This way you won't pull 17Million rows of data across the net. Datastage can initiate the SQL thru a Datastage job.

I'm at a site right now where we are trying to do something similar, but with 2.3 BILLION rows of data on a multi-node grid. The designer tried to pull the data into Datastage across the network and process. By processing the data first on the source system before pulling the subset of rows into the datastage environment, we shaved 8 hours off the job execution time.

In larger sites with higher data volumes, the Network can become the primary bottleneck. So we removed it.

Regards