Page 1 of 1

Projects with many jobs have slow graphical front-end resp.

Posted: Tue Sep 20, 2005 12:48 am
by ArndW
Ever since the first versions of DataStage, the number of jobs in a project, and later on within a Category in a project, adversely affected the performance in listing jobs in the Director, Designer and Manager.

Although this has gotten a bit better over time, it still remains an issue at every single larger installation that I see. At some sites where a category never contains more than 10-20 jobs it still takes 2 to 5 minutes to get into the Director or Manager overview even when the system is otherwise not heavily loaded.

Apart from re-hashing the VOC file (which, at a default of modulo 23 will be seriously overflowed after adding just a couple of jobs in any project), what can be done under the covers to speed up this performance?

Which hashed files are actually opened, read and selected from during this process? I know for certain that the VOC and the DS_JOBS are used, but what other files need to be used? I would really like to speed this up as much as possible in order to make it more useable in a production environment.
[/b]

Posted: Tue Sep 20, 2005 1:07 am
by ray.wurlod
Sure does.

When compiling routines for example, a full usage analysis is performed.

When retrieving jobs, it should be able to retrieve just the currently selected Category (unless, of course, view categories is disabled in Director) using the CATEGORY index on DS_JOBS. For each job it needs to reference DS_AUDIT to get the date/time modified, and at least one other hashed file, for example RT_STATUSnn if you're in status view in Director.

It seems to me, based on the behaviour of the status bar, that the index is not being used. If it were, retrieval time when there are only a few jobs in a category should be much faster than when there are many jobs in a category. This is not the reported experience.

There are only a few records in RT_STATUSnn (provided you haven't generated too many waves), which is not a big hit. Obviously in log view you need to process RT_LOGnn, which isn't indexed, so the entire log has to be processed even though the default filter setting is Last 100.

I doubt there's much you can do under the covers apart from ask "them" whether indexing is being used properly. My guess is that they use "pure" BASIC in the helper subroutines, using the SELECT statement (which does not use indexes) rather than a query to interrogate DS_JOBS.

Why not enable server side tracing and see whether you can figure out what's happening? My bet is that the helper subroutines DSR_SELECT and DSR_RECORD will feature hugely.

Could be worth posting this question on ADN, where some kind engineer might be motivated to take a look at the source code.

Posted: Tue Sep 20, 2005 1:14 pm
by chulett
I was always under the impression that your network infrastructure played a bigger role than just the number of jobs in the project. For example...

Our biggest project here has 3,631 jobs in it right now. Opening the Director into the Category with the most amount of jobs in it (121) takes only 13 seconds. It's only if I turn off categories and pull in all jobs that it takes just around 5 minutes. The Manager should be even faster as it doesn't pull all that extra status and scheduling information in.

My timings from home are higher, i.e. take longer. The people that attach to the server from the other side of the States (basically) have even higher times. But here, locally, they are not bad at all.

So I'm a little surprised to see timings of 2 to 5 minutes to display all of 20 jobs. :?

Posted: Tue Sep 20, 2005 1:52 pm
by ArndW
I did a couple of tests today, amongst which I put AK's on all 2000 RT_STATUS file field F1 and that did make a difference, but not enough. I'm doing timing tests and file I/O (converted the dynamic files to static hashed so I could FILE.STATUS them). The system seems to be doing very many file opens in relation to reads/writes but I am not certain if they aren't the result of mfile operations.

Posted: Tue Sep 20, 2005 5:09 pm
by ray.wurlod
It does do a lot of logical opens but, as you say, you would hope that many of those would be handled by the rotating file pool. This is the nature of the beast; they need to guarantee that the file is open, even though coming from what is essentially a stateless client.

How do you propose to index RT_STATUSnn for newly-created jobs?

Posted: Wed Sep 21, 2005 8:27 am
by ArndW
I plan on firing up a daily job to check for RT_STATUS files without indices. But I am not sure that this is really going to work all that well overall. I think I need to research what other files are used and, as you surmised, not SELECTed with indices. Too bad that we don't have memory files, or that we could activate public sharing of files such as VOC or DS_JOBS or DS_JOBOBJECTS...

Posted: Wed Sep 21, 2005 4:47 pm
by ray.wurlod
You could read up about the following commands in the Ascential DataStage Hash Stage Disk Caching manual! :wink:
    SET.MODE
    LIST.FILE.CACHE
    CLEAR.FILE.CACHE
    CATALOG.FILE.CACHE
    DECATALOG.FILE.CACHE
    DAEMON.FILE.CACHE

Posted: Thu Sep 22, 2005 12:53 am
by ArndW
Thanks Ray - I did enable file caching for public sharing at this site, but assumed that it applied only to job runs and not to system files. If the functionality is part of the engine then I think we might have a winner after all.