Performance Monitoring

Raftsman · Post by **Raftsman** » Wed Apr 28, 2010 9:32 am

I did a search on performance monitoring and still can't seem to find what I am looking for.

We have a Windows server with 32 gigs of RAM and Quad core and 600 GB of drive space.

During our process, we sometimes receive a resource allocation error that aborts the process.

What kinds of monitoring could I ask the network people for. i.e. memory, cpu etc. Is there specific area that they could look at to help determine some of our issues.

I know very little about the network area and would like to ask the proper questions in regards to Datastage 8.1

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Apr 28, 2010 4:57 pm

Before you do that, why not investigate the Performance Analysis and Resource Estimation tools right there in DataStage?

Raftsman · Post by **Raftsman** » Thu Apr 29, 2010 6:11 am

Thanks Ray. I'll read the docs but I thought it was more related to specific programs. The process never aborts in the same location. I don't know if I can perform this task on sequences. I am not sure if it will pinpoint the areas in question.

djwagner · Post by **djwagner** » Mon May 03, 2010 2:56 pm

We have a similarly sized Windows server (dual quad core, actually) and receive intermittent resource errors that are extremely difficult to reproduce/replicate.

IBM support indicates job design every time, but I feel that it is the easiest answer for them to give.

We run multiple invocations at a time (no more than two to three invocations simultaneously), each using the 2-node configuration file. Still, we have had jobs abort, mostly in cases when "larger design" jobs in the separate invocations happen to run at the same time (most of the time the sequences' processing is staggered, so things work fine, because a larger job from one invocation may run during a smaller job of the other invocation, etc.).

From our experiences, it seems to be more regarding job size rather than number of rows being processed--i.e. I once had a job fail that was running less than 100 rows. So perhaps it's the overhead of starting up the job and related processes? There is plenty of available memory and disk but sometimes, CPU may be very high for short intervals (10-20 seconds). However, I feel that this is normal in server computing and shouldn't "break" the system.

All we can do is be extremely aware of job design, number of stages in the job, (number of system processes that are being created at runtime), and usage of sort/RD/aggregator stages which we feel are more resource intensive than most.

And we're continuing to collect information on the occurrences as they happen. The hardest part about it is that it gives us no indication of WHAT resource is lacking (hardware vs. logical/internal to Datastage).

I'll keep monitoring this thread too, as I have a high interest in any information learned/shared.

Thanks,
David Wagner

JonMorgan · Post by **JonMorgan** » Mon May 03, 2010 6:24 pm

I've worked on a number of Windows installations, 8.0.1 and 8.1

When these strange errors occurs check the Windows event log and see if any warning/error entries exist.