xml stage performance issues

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

xml stage performance issues

Post by samyamkrishna »

Hi,

I have job where it reads from a txt file. One of the column in the file is xml.
the file is 15gb and has 40 million records. The job runs for 3 hours.

Is there any way i can improve its performance?

Regards,
Samyam
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Describe your job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

5 GB/hour isn't too bad on a small configuration. How many nodes are you using, and what kinds of (how powerful) processors?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi All,

Sorry about the delayed response.
We tried a lot of options and the one of them gave us a good performance improvement.

We Split the input file into 4 files of the size 4GB each and triggered the same job 4 times in parallel reading the the 4 different files.

It came down to 1 hour processing time.

Regards,
Samyam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You probably could have done that in one job with four partitions (or eight).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply