Problems with DS Calls

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Viswanath
Participant
Posts: 68
Joined: Tue Jul 08, 2003 10:46 pm

Problems with DS Calls

Post by Viswanath »

Hi All,

We have facing a couple of problems while trying to call some DataStage jobs or sequencers from Control M using command files.

1.) While trying to call a job from control M using a command file, every job is set to reset first and is then run. Now sometimes the reset dooesnt finish and keeps running. Now a wait of 12 seconds is given between the reset and the run time. However due to the problme stated above, my job eventually fails with a DSBadState = 2. Any idea why this happens. Basically at this point i have to stop the reset and rerun the job. The next time the job run ok.

2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.

Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]

Any idea why this has happened?

ANy help would be great.

Cheers,
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Hi,

Are you using dsjob to execute the jobs ?
I suggest you write yuorself a little script the does the error handling for you. first use -mode RESET and then run.
Do not execute a job directly, wrap it in a sequencer and use 'reset if required then run' .


HTH,
Amos
Viswanath
Participant
Posts: 68
Joined: Tue Jul 08, 2003 10:46 pm

Post by Viswanath »

Hi,

I am using dsjob to run these jobs and all cases i do have have sequencer with the "Reset if required then run" option. Excpet in one case where in a job is called instead of a sequencer.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

It is my experience that reset a job takes 30 seconds if the system is overloaded. A sequence should handle it for you. I expect that you need to have an otherwise link in your sequence. Not all situations are trapped in a sequence if all you have is a OK or successful link and an error link then the sequence ends if you have a warning. You need an error and an otherwise link or a OK and errors are the otherwise link.
Mamu Kim
kiran_kom
Participant
Posts: 29
Joined: Mon Jan 12, 2004 10:51 pm

Post by kiran_kom »

I've had the same problem this morning. It might be because your are using the "reset if required" option. I had inadvertedly used it in my sequence and ran into the same issue. Try taking it out. It seemed to have solved mine, but im not a 100% sure, my jobs are still running.

Also are you by any chance making a heavy usage of hash files in any of those jobs ?? there is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kiran_kom
Participant
Posts: 29
Joined: Mon Jan 12, 2004 10:51 pm

Post by kiran_kom »

ray.wurlod wrote:
kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?
Umm no...This is a known issue with DS windows (well known only to Ascential folks I guess). We have a jobs that make heavy usage of hash files and there are multiple instances of the same job running.

this sometimes causes DS to crash. the error manifests itself as a "User limit reached" message in the &PH& directory. Ascential is working on a fix for it.

Yesterday when I was having the above mentioned problem, I also ran into this problem with hash files. I was
I dont think the above problem is related to this hash file issue. Because just now (5 mins back) my jobs failed with the same controller problem (and not they didnt have "reset if required" turned on.) I didnt find any of the "User limit reached" messages in &PH&, so I guess this is a seperate issue.
kiran_kom
Participant
Posts: 29
Joined: Mon Jan 12, 2004 10:51 pm

Post by kiran_kom »

ray.wurlod wrote:
kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?
the support case number is 385112*WES
rdy
Participant
Posts: 38
Joined: Wed Nov 05, 2003 2:40 pm

Re: Problems with DS Calls

Post by rdy »

Viswanath wrote:Hi All,


2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.

Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
Did you ever resolve #2? We have the same problem occasionally and Ascential is pointing us to the shared memory parameters on our Solaris box. If you look on page 3-4 of the installation guide, they list the minimum recommended values.

You can see those values on a Solaris system by running /etc/sysdef and grep out the parm you're looking for. E.g. /etc/sysdef | grep SHMMNI.

I was told that SHMMNI was probably the culprit on our system. I'll let you know if it helps after we've made the change.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Problems with DS Calls

Post by ogmios »

A little bit off topic, but my hat off to the guesses of Ascential support. And that's meant ironically. :P

They'd better build some more tracing in DataStage as e.g. is the case in Oracle. Something in Oracle goes wrong, you make an iTar and 99% of the times you get a real fix very soon.

Ogmios
smohamme
Participant
Posts: 9
Joined: Sat Aug 07, 2004 2:48 pm
Location: Dallas
Contact:

Re: Problems with DS Calls

Post by smohamme »

smohamme wrote:
rdy wrote:
Viswanath wrote:Hi All,


2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.

Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
Did you ever resolve #2? We have the same problem occasionally and Ascential is pointing us to the shared memory parameters on our Solaris box. If you look on page 3-4 of the installation guide, they list the minimum recommended values.

You can see those values on a Solaris system by running /etc/sysdef and grep out the parm you're looking for. E.g. /etc/sysdef | grep SHMMNI.

I was told that SHMMNI was probably the culprit on our system. I'll let you know if it helps after we've made the change.


Hello

I was wondering whether you fixed issue #2. We have been getting this for a week and Ascential has not been able to solve it. Obviously we are using Datastage 6.x running on Solaris. I will try your suggestion too and see what happens? Also what should the SHMMNI be set at?

Thank you!
smohamme
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Problems with DS Calls

Post by ogmios »

At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.

I don't anymore which shared libraries and the order of it.

Ogmios
smohamme
Participant
Posts: 9
Joined: Sat Aug 07, 2004 2:48 pm
Location: Dallas
Contact:

Re: Problems with DS Calls

Post by smohamme »

ogmios wrote:At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.

I don't anymore which shared libraries and the order of it.

Ogmios
smohamme wrote: Thank you! If you can, please provide more details like the shared libraries and their order.
smohamme
smohamme
Participant
Posts: 9
Joined: Sat Aug 07, 2004 2:48 pm
Location: Dallas
Contact:

Re: Problems with DS Calls

Post by smohamme »

smohamme wrote:
ogmios wrote:At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.

I don't anymore which shared libraries and the order of it.

Ogmios
smohamme wrote: Thank you! If you can, please provide more details like the shared libraries and their order.
We have added the following library path in the dsenv file:

LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH

and we have changed the DSRunJob.B file since it was corrupt. This did not fix our "Time Out..." issue. Ascential also informed us that the Production ETL box is over utlized. Since we instantiate our jobs 20 times, we reduced it to instantiate 12 times and after these were complete, start another run with the other 8 times. This has worked, although it does not explain why on our Dev box (which is smaller in processing power/memory) this works fine with the 20 instantiations. The difference between the 2 in the uvconfig file is:

(production) (development)
1. MFILES 200 MFILES 50
2. T30FILE 2000 T30FILE 500
3. UVSYNC 0 UVSYNC 1
4. 64BIT_FILES 1 64BIT_FILES 0
smohamme
Post Reply