Delays between Job Sequences / Calling Next Job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Delays between Job Sequences / Calling Next Job

Post by bmouton »

I'm getting inconsistent performance times with Datastage on a recurring basis for the same volume of data being processed.

Brief Description:
1 Project - Multiple sequences executing SEQUENTIALLY.

The first run completed in 12 minutes (overall data processed 200 MB). The second run in 12 minutes.

Three hours later, less data being processed (50 MB), the Master Job Sequence ran 32 minutes.

Datastage is the only process running on the machine (apart from the OS).

I know this is extremely vague. I have no idea where to start looking.

From one cycle to the next, we can see as much as a two hour difference.

We have bounced Datastage, DB2 9.1 (Repository), the Linux Server multiple times. Sometimes, DS flies and other times it crawls like a snail ...

I have searched IBM's website and DSXchange to see if anyone has encountered this type of issue.

We have been monitoring CPU, Memory, and IO on both the Datastage Server and the DB2 9.5 Server. When the cycle runs fast or slow, the CPU, Memory, and IO are the SAME.

Help!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do the jobs do? In particular do they access data over a network? Have you checked that the network might be the bottleneck, because everyone's downloading videos on it?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

We are running fiber channel to our SAN ... We have been monitoring activity on the network and the SAN. There are no spikes in either when this inconsistency occurs.

It's not a bandwidth issue. We are 2 GIG-E fiber ... That's not the problem ...

And the network has limited sharing. We have noticed that the problem occurs RANDOMLY. There does not appear to be an issue with the network or the SAN.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

By saying ALL jobs are slower, we can't easily help pinpoint problems. If we could talk about a specific job then we can focus in on exact issues.

For example, if a job runs twice in the same day, processes different volumes, but has longer runtimes for the smaller volumes, maybe we could talk about the profile of the data. Maybe the larger volumes were more inserts then updates and loaded quicker. However, smaller volumes but more updates can take much longer.

If you see an across the board degradation and can't explain why, that points to hardware more than data profile. A simple job that reads and writes between sequential files should operate at a consistent pace given excess cpu resources (notice I said pace and not time). If that type of job ran at a different pace then you should investigate your disks - your processes could be starving for data or having issues writing out its data.

I recommend focus on a few simple jobs and use those to measure your performance differential. A simple job that extracts a table and dumps to a file without much transformation/lookup logic is a great example to measure if there's network traffic issues dumping out the data. Another example is the seq-->xfm--->seq type job to point out cpu/disk issues. If you have complicated jobs that mix database with transformation and more database loading you're in a nearly impossible position to troubleshoot without breaking down the jobs.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My reason for asking about remote data is that the variability might be in the load on database servers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

OK ... Here are more specifics:

Same data volume. Same exact data set ...

I run the Master Job Sequence that calls a set of job sequences SEQUENTIALLY.

1st Run - 12 minutes
Empty the staging tables, and Data mart tables.

2nd Run - 12 minutes
Empty the staging tables, and Data mart tables.

3rd Run - 12 minutes
Empty the staging tables, and Data mart tables.

15th Run - 35 minutes
Empty the staging tables, and Data mart tables.

16th Run - 35 minutes
Empty the staging tables, and Data mart tables.


22nd Run - 18 minutes
Empty the staging tables, and Data mart tables.

So ... during those times we monitored network traffic, database traffic, datastage traffic, CPU, Memory, and IO for all related boxes.

No apparent issues.

Are there settings in Datastage that default to a "Datastage Adjusted" (e.g. automatically managed by DS) that are not part of the normal install? In DB2 9.5 running on SLES10, that we could not use automatic memory management. Additionally, we had to manually configure CPU speed (I'd never done that in my career).

So ... It doesn't seem to be a job thing ... It's in Datastage somewhere ...

While going out and creating test jobs sounds like a wonderful thing, I have done that ... And I am unable to duplicate the same problem ...
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Clarify something here. Your subject says "Delays between Job Sequences / Calling Next Job" and yet you never mention anything about that in your posts, just mentioning total run time for the sequence of jobs. So... when the overall sequence goes from 12 minutes to 35 minutes for the exact same data, are the individual jobs taking incrementally longer to process that data? Or, as per your subject, do the individual jobs all run in approximately the same time and the delay is all in-between jobs?
-craig

"You can never have too many knives" -- Logan Nine Fingers
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

Fair enough ...

We have been monitoring through Director ...

It is unfortunate that we cannot repeat the same "lag" between jobs / job sequences.

For example, the Master Job Sequence will start, then a minute later the first job in the Sequence will start. It will run for 10 seconds, then the next job may or may not start immediately.

The problem is that sometimes they fire off as soon as the predecessor completes. Other times, it "stalls" for a minute to up to 10 minutes before starting the next job in the sequence.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First thing to check would be the number of entries in the Project's &PH& "phantom" directory. Large or out of control numbers there could induce a processing lag. If needed, clear out anything 2+ days old and see if that helps.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

Craig,

Not to sound stupid ... Should I bring down DS? or simply stopping running the jobs????


Thanks!! Very Helpful Info!
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

Got the command ...

We'll see how it goes now ...

Thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry for the late response. No need to bring down DataStage and the "2+ days old" criteria was to avoid effecting any running jobs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

That cleaned out the phantom files ...

Unfortunately. no improvement ...

We are thinking about updating the statistics on the xmetadb to see if that would help.

We are also removing our temporary hash files that we use in the process to see if that helps ...
bmouton
Participant
Posts: 18
Joined: Thu Nov 06, 2008 8:31 pm

Post by bmouton »

Help!!!

So ... We have been tracking the times that the jobs run themselves ... The jobs fly ... In and out in seconds ...

The issue appears (but not certain) in Datastage ... We have approximately 80 jobs that all run sequentially ...

All of the jobs were being checkpointed ... We removed 80% of the checkpoints and no improvements ...

Any ideas on what cause the jobs to pause before calling the next job?

What is even more strange is that we have 5 identical Datastage Cycles (Job Sequences and jobs) pointing to its respective file system and database. The smaller database (and volume of data) that runs takes longer than the database that have 10 times more data.

I'm reaching for straws here ... Could it be bufferpool size in the DB2 9.1 Xmetadb repository? SLES 10 not working well with DS?

Please help!!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

We're with you but also clutching at straws.

What has your official support provider had to say?

Alas this is a scenario that would be difficult to reproduce - it would need comparable hardware but also the long time period over which to degrade the elapsed times.

Have you been generating operational metadata? (This is collected into the XMETA database - maybe the fact that those tables are increasing in size and need to manage their tablespace is part of this problem.) Note that this is only another straw - I can not say for certain.

I suspect that, whatever the solution is, it will need to be found by someone expert actually poking around in your system. Things can appear different on the other side of the glass.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply