Determine which jobs have failed without using Director

jdmiceli · Post by **jdmiceli** » Mon Aug 02, 2010 4:15 pm

Hi all,

Here's the technical stuff to set the stage for the question(s):
- Running DataStage Server Edition v.8.0.1 on AIX Unix.
- Source and target db's are SQL Server 2005 and 2000 respectively.
- The project is one code base parameterized to handle 22 separate companies.
- The project was converted from 7.5.1a, so no parameter sets are in use.
- All databases involved have the same structure, one company per source/target pair of db's.
- Total job count in project is 1253.
- Average job count per company cycle each night is around 700-800 depending on size and volume.
- Logs are retained for one day only.

Here is the problem: when we have to reset a bunch of jobs after multiple failures for whatever reason (usual error -14 - but that's another story and question entirely

), we currently have to use the DataStage Director and manually go through each directory to check the logs for jobs that either aborted or became some status other than runnable. We then manually reset them and resurrect the batch (we use Ken Bland's job control process).

The response times in the Director are horrid, especially if other companies are still running. I have organized the project to limit the number of jobs in a folder as this helps with the return times on the list, but it still is not fun when debugging at 2 AM when I am half awake.

Is there a way to query the jobs that are in a status 3? I want to say there is also a status in the 90's that also requires resetting. I don't want to automatically reset all the jobs each time because that increases my processing times beyond my processing window (I only have 4.5 hours to finish everything). Usually, the failures generate no more than 100 jobs in failed condition, so it would seem inefficient to query the status of all of them to find the handful needing a reset by running against every job in the project.

Thoughts?

As usual, your expert input is truly appreciated.

ray.wurlod · Post by **ray.wurlod** » Mon Aug 02, 2010 4:39 pm

There is no single table that contains this information - each job has its own status in its own table. However you could create a routine or dsjob script that cycles through all job names and reports the status of each, perhaps with a filter of 3 or 96 built in there somewhere.

In theory you should also be able to create a custom report (assuming you are capturing operational metadata) in Information Server Web Console.

chulett · Post by **chulett** » Mon Aug 02, 2010 5:50 pm

Yah, this should be pretty simple to set up and operate, most of us have something like this in our toolkits, I'd wager.

One word of advice: when you do this make sure you filter out the name of the job that is doing the actually checking and don't attempt to interogate it as doing things like attempting to attach to yourself can result in a 'hang'. I also wouldn't look for the '3' or '96' but as a Best Practice use the built in DSJ.JOBFAILED or DSJ.JOBCRASHED values... I don't have access to anything to get the exact names but that should be pretty close.

jdmiceli · Post by **jdmiceli** » Tue Aug 03, 2010 8:08 am

Thanks Gents for getting back to me with your input. I very much appreciate it :D

Thanks too for the reminder of the DSJ.JOBFAILED & DSJ.JOBCRASHED values, Craig. I had forgotten about those. For some reason I tend to think at the file and script level over routines. I will do some digging and see what I can figure out. Once I get something put together, I'll post it up here to see if you have any more suggestions to improve it (or correct errors I'm likely to make since I don't do much DS Basic scripting

)

arunkumarmm · Post by **arunkumarmm** » Tue Aug 03, 2010 1:12 pm

Doesnt your job use a sequence? Are you not running it thru scheduler?

jdmiceli · Post by **jdmiceli** » Tue Aug 03, 2010 3:28 pm

arunkumarmm wrote:Doesnt your job use a sequence? Are you not running it thru scheduler?

I do have job sequences that control the work flow for processing each table. The interesting thing about the job sequences is that even though each step that calls a job is set to 'Reset if required, then run', it doesn't always work. I have experimented with it quite a bit in the past and never got any good answers from our support staff. Maybe it is the shear number of parameters or the fact that I use invocation_id's everywhere possible to prevent processes from clobbering each other. (Guess I just kinda gave up

)

Scheduling is handled through 'cron'.

kduke · Post by **kduke** » Tue Aug 03, 2010 5:47 pm

Director is slow because it interogates all the jobs and all the instances of the jobs to get their statuses. You need to clear the old instances either by regularly compiling your jobs or some other mechanism. It will greatly improve your Director performance. Limiting the number of jobs in category will also improve performance because it will only interogate the current job category.

I thought KBA utilities wrote out job statuses at the end of each run. Why can't you get the statuses out of this table or hashed file.

jdmiceli · Post by **jdmiceli** » Tue Aug 03, 2010 6:21 pm

Hi Kim,

Thanks for your input too. I had to stop using the KBA utilities because the version we have is for the 7.5.x and when we converted to 8.0.1, we started getting very flakey results on certain things. The decision was made by my boss and the admins to stop using them. I just haven't had time to monkey around with making them work with our current configuration. I will try setting up some recompile jobs for the weekends and see if that helps too.