Auto-purge takes a lot of time

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
chrhagen
Participant
Posts: 4
Joined: Tue Apr 29, 2014 10:32 am
Location: Hannover
Contact:

Auto-purge takes a lot of time

Post by chrhagen »

Hello all,

in one project I can see the symptoms mentioned by this technote:
http://www-01.ibm.com/support/docview.w ... wg21623578
Problem(Abstract)
System is overloaded by DataStage "Phantom DSD.RUN..." processes attached to init.

Symptom
PPID of Phantom DSD.RUN is 1 (attached to init) and still consuming CPU.
They are using some server-jobs which just do something like "get a jobid" and so on. They have used server-jobs because this functionality does not need parallel jobs and the overhead of them. I thought that is in general a good idea.

But now these heavy used, multiple-instance jobs created for quite a time 100% cpu. I can see that in top. In one environment the phantom jobs lasts for 2min. In another I have seen a job running for 12min.

To be sure: The job has ended. Everything is fine in director. The job ran for some seconds. It's just a phantom process, sometimes with a PPID=1.

The technote mentioned to shorten the interval for auto-purge. The original interval was 3 days . We have shortend it to 20 runs and even 1 run. One phantom job is running 100%. If we deactivate auto-purge altogether, it works nicely. But we cannot do that for a long time.

It happens in multiple environments and on multiple projects. We cannot reproduce it on a newly created projects and with newly created jobs. Maybe it takes some weeks.
Workload does not affect the problem. One test was on a machine, that had currently no jobs running. I started on of the server jobs and 100% for 1min. A copy of the job runs nicely

They have also some parallel jobs with quite the same number of instances. But they run nicely. Manually clearing the joblog does not help.

Possible quickfixes
1) disable auto-purge for some days and on Friday set auto-purge to 3 days
2) disable auto-purge and manually purge logs (like CLEAR.FILE RT_LOGnnn)
3) rewrite the jobs to use parallel jobs or maybe shell scripts

Currently were trying 1, but maybe go for 3. But I want to find a real solution.

Do you have any idea?

Enviroment
Datastage 8.7 FP 1
DB2 9.7 for XMETA, but Loggin in XMETA is disabled

Thanks for your help!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Logs are logs, switching to Parallel jobs won't help in that regard, actually they log more messages so would probably make it worse. How many instances of the MI jobs do you run simultaneously?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chrhagen
Participant
Posts: 4
Joined: Tue Apr 29, 2014 10:32 am
Location: Hannover
Contact:

Post by chrhagen »

Hello chulett,

yes, that is something I don't understand. (some) server-job are running wild, no parallel job have the same problem. So I thought too, that log is log. But look at the tech note which mentions server jobs.

There are three server jobs to generate a "jobid" which is used in the warehouse-jobs to document the job which generates the data. A parallel job is use to document that a warehouse-job has finished. So for every three server-job, I have a parallel job. A warehouse-job consists of at least one sequenze, som of them are multiple instances too, but not that much.

In one day there are probable 300-400 warehouse jobs which need a job did. So I have 300-400 instances for each of these jobs. In parallel are running maybe no more that 3 jobs which show this problem.
[edit 10.16.14]I have to correct myself. Over the time these jobs get called quite a number of times. I can seen that the jobs have more than 2000 instances.

As I mentioned: even if the (development) system is doing no jobs, just running one of these server jobs generate 100% cpu for one cpu out of four.

--
christian
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

which 8.x version are you on? Did you check if ORLogging is enabled there?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chrhagen
Participant
Posts: 4
Joined: Tue Apr 29, 2014 10:32 am
Location: Hannover
Contact:

Post by chrhagen »

Enviroment
- Datastage 8.7 FP 1
- DB2 9.7 for XMETA
- Logging in XMETA is disabled (set in Administrator)
I have checked the Logging in the web-console and in the XMETA-tables. No rows
Christian

"You don't have to be crazy to study Physics, but it helps"
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

chrhagen wrote:Over the time these jobs get called quite a number of times. I can seen that the jobs have more than 2000 instances.
No wonder you have log issues with them, did you mention that fact to IBM yet? Perhaps you meant 2000 total over all jobs but let's start with 2000 for one particular job. You do understand that those '2000 instances' all go into the same LOG table with each Invocation that shows in the Director being the equivalent of a view into the that single table's data filtered on that Invocation ID?

All we've all said so far is correct - a log is a log and this isn't a Server job versus a Parallel job issue in spite of that tech note. It's a multi-instance log issue specifically, one compounded by a high daily run volume. Allow them to grow 'too large' (greater than 2.2GB) and they will corrupt and become unusable causing the job to no longer run.

Sorry I don't have a complete answer for you off the top of my head, I haven't had access to DataStage for four years. Others however have posted some good advice on how to handle situations like this, because auto-purge in these situations is a known trouble maker.

If your jobs really don't have an issue with it disabled, then perhaps an answer might be to script something to run on a regular basis to clear the problem children during an idle time. That could be something in cron from the command line that leverages CLEAR.FILE or perhaps even a recompile as I seem to remember that that will remove all instances from the Director.

Let's see what others come up with.

If there is no good solution forthcoming for this, perhaps we could discuss the specifics of your current implementation of this JOBID process. Sure seems to me like it could be simplified in a manner that would be much less... resource intensive.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chrhagen
Participant
Posts: 4
Joined: Tue Apr 29, 2014 10:32 am
Location: Hannover
Contact:

Post by chrhagen »

Thank you for your ideas!
chulett wrote:
chrhagen wrote:Over the time these jobs get called quite a number of times. I can seen that the jobs have more than 2000 instances.
No wonder you have log issues with them, did you mention that fact to IBM yet? Perhaps you meant 2000 total over all jobs but let's start with 2000 for one particular job. You do understand that those '2000 instances' all go into the same LOG table with each Invocation that shows in the Director being the equivalent of a view into the that single table's data filtered on that Invocation ID?
Yes of course. These huge numbers of instances produce also a problem in Director. But: the number of parallel jobs is not that high!
It's not my solution, so I had really to count the instances to grasp the full meaning of all. Maybe we will start a PMR. The solution from the tech note is not working, so maybe we will get an answer from them. But I know how fast IBM is, so we have to search for an immediate answer.
chulett wrote:All we've all said so far is correct - a log is a log and this isn't a Server job versus a Parallel job issue in spite of that tech note. It's a multi-instance log issue specifically, one compounded by a high daily run volume. Allow them to grow 'too large' (greater than 2.2GB) and they will corrupt and become unusable causing the job to no longer run.
Yes, I got now that impression too. I tried some sample jobs today and started 1000 instances (max 5 in parallel) and I could roughly produce the same symptoms. But interestingly; server-jobs have bigger problems. But I cannot prove that right now.
chulett wrote:Sorry I don't have a complete answer for you off the top of my head, I haven't had access to DataStage for four years. Others however have posted some good advice on how to handle situations like this, because auto-purge in these situations is a known trouble maker.
OK, I haven't found anything around that in this forum and with Google, But of course it's always a question of the right search terms :-)
chulett wrote:If your jobs really don't have an issue with it disabled, then perhaps an answer might be to script something to run on a regular basis to clear the problem children during an idle time. That could be something in cron from the command line that leverages CLEAR.FILE or perhaps even a recompile as I seem to remember that that will remove all instances from the Director.
That was my first idea for quick fix. But now I will try to change to most offending jobs and convert them into shell scripts. These jobs do not work on real data, but just job control meta data. You don't need Datastage for that.
chulett wrote:If there is no good solution forthcoming for this, perhaps we could discuss the specifics of your current implementation of this JOBID process. Sure seems to me like it could be simplified in a manner that would be much less... resource intensive.
Nope, not my implementation. I would never done something like that :D
Christian

"You don't have to be crazy to study Physics, but it helps"
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Open a PMR. They should be able to help guide you to a solution.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or as you said, if some of these steps don't really need an ETL tool to perform them and can be converted to a shell script, that would be well worth your time in my book. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply