Projects with many jobs have slow graphical front-end resp.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Projects with many jobs have slow graphical front-end resp.

Post by ArndW »

Ever since the first versions of DataStage, the number of jobs in a project, and later on within a Category in a project, adversely affected the performance in listing jobs in the Director, Designer and Manager.

Although this has gotten a bit better over time, it still remains an issue at every single larger installation that I see. At some sites where a category never contains more than 10-20 jobs it still takes 2 to 5 minutes to get into the Director or Manager overview even when the system is otherwise not heavily loaded.

Apart from re-hashing the VOC file (which, at a default of modulo 23 will be seriously overflowed after adding just a couple of jobs in any project), what can be done under the covers to speed up this performance?

Which hashed files are actually opened, read and selected from during this process? I know for certain that the VOC and the DS_JOBS are used, but what other files need to be used? I would really like to speed this up as much as possible in order to make it more useable in a production environment.
[/b]
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sure does.

When compiling routines for example, a full usage analysis is performed.

When retrieving jobs, it should be able to retrieve just the currently selected Category (unless, of course, view categories is disabled in Director) using the CATEGORY index on DS_JOBS. For each job it needs to reference DS_AUDIT to get the date/time modified, and at least one other hashed file, for example RT_STATUSnn if you're in status view in Director.

It seems to me, based on the behaviour of the status bar, that the index is not being used. If it were, retrieval time when there are only a few jobs in a category should be much faster than when there are many jobs in a category. This is not the reported experience.

There are only a few records in RT_STATUSnn (provided you haven't generated too many waves), which is not a big hit. Obviously in log view you need to process RT_LOGnn, which isn't indexed, so the entire log has to be processed even though the default filter setting is Last 100.

I doubt there's much you can do under the covers apart from ask "them" whether indexing is being used properly. My guess is that they use "pure" BASIC in the helper subroutines, using the SELECT statement (which does not use indexes) rather than a query to interrogate DS_JOBS.

Why not enable server side tracing and see whether you can figure out what's happening? My bet is that the helper subroutines DSR_SELECT and DSR_RECORD will feature hugely.

Could be worth posting this question on ADN, where some kind engineer might be motivated to take a look at the source code.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I was always under the impression that your network infrastructure played a bigger role than just the number of jobs in the project. For example...

Our biggest project here has 3,631 jobs in it right now. Opening the Director into the Category with the most amount of jobs in it (121) takes only 13 seconds. It's only if I turn off categories and pull in all jobs that it takes just around 5 minutes. The Manager should be even faster as it doesn't pull all that extra status and scheduling information in.

My timings from home are higher, i.e. take longer. The people that attach to the server from the other side of the States (basically) have even higher times. But here, locally, they are not bad at all.

So I'm a little surprised to see timings of 2 to 5 minutes to display all of 20 jobs. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I did a couple of tests today, amongst which I put AK's on all 2000 RT_STATUS file field F1 and that did make a difference, but not enough. I'm doing timing tests and file I/O (converted the dynamic files to static hashed so I could FILE.STATUS them). The system seems to be doing very many file opens in relation to reads/writes but I am not certain if they aren't the result of mfile operations.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It does do a lot of logical opens but, as you say, you would hope that many of those would be handled by the rotating file pool. This is the nature of the beast; they need to guarantee that the file is open, even though coming from what is essentially a stateless client.

How do you propose to index RT_STATUSnn for newly-created jobs?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I plan on firing up a daily job to check for RT_STATUS files without indices. But I am not sure that this is really going to work all that well overall. I think I need to research what other files are used and, as you surmised, not SELECTed with indices. Too bad that we don't have memory files, or that we could activate public sharing of files such as VOC or DS_JOBS or DS_JOBOBJECTS...
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could read up about the following commands in the Ascential DataStage Hash Stage Disk Caching manual! :wink:
    SET.MODE
    LIST.FILE.CACHE
    CLEAR.FILE.CACHE
    CATALOG.FILE.CACHE
    DECATALOG.FILE.CACHE
    DAEMON.FILE.CACHE
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Thanks Ray - I did enable file caching for public sharing at this site, but assumed that it applied only to job runs and not to system files. If the functionality is part of the engine then I think we might have a winner after all.
Post Reply