Transform processes randomly hanging.

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Transform processes randomly hanging.

Post by admin »

I have an intermittent problem.

Some time back, (pre Informix days) I spent a fair bit of time looking into this, with the assistance of Ardent. We never did resolve it, but it became much less frequent, so I have not pursued it since. (More on this later).

The symptoms are as follows:

Within a job, one or more active stages will have a status of starting. It does not matter how long you leave the job, nothing changes. These jobs will abort if I stop them from Director. There is no consistency as to which jobs will do this or which stage(s) of which jobs. I have personally only seen this problem on an active stage whose primary input is from ODBC. Another developer on our team claims to have had this problem where the primary input was ORAOCI8, but I cannot positively confirm this. I can say that it has happened with different ODBC drivers. I have seen it with the Microsoft Visual Foxpro and the Microsoft SQL Server ODBC drivers.

The problem is not predictably repeatable, although seems to only occur when our server is very busy.

What we have figure out:

It would appear that the process for the job attempts to start the processes for each active stage but some of these processes either do not start or die suddenly as soon as they have started. These processes manage to create a work file in &PH& but do not write anything to it.

I originally had the problem when reading from some dBase IV format files using the MS Visual Foxpro ver 5 ODBC driver (note that the usual dBase driver does not handle long file names). Based on suspicions (now why would I think that about MS software) that it was to do with ODBC, I upgraded to version 6 of the Visual Foxpro driver. This dramatically reduced the problem, although it did not go away completely. It has only been happening, say, once per month (which isnt bad as we run about 300 jobs every night).

Unfortunately the problem has been happening a bit more often lately.

* News Flash * Another job has just been discovered hung as I type this. This job is reading from ORAOCI stages (not ORAOCI8). So much for my theory about only being ODBC.

Im not really expecting anyone to solve this problem and as Ardent and I have been over this in detail, Id be surprised (but thankful) if anyone came up with something new that we havent already looked at.

What I really want to know is HAS THIS HAPPENED TO ANYONE ELSE??? Or am I alone with this problem?

David Barham
Information Technology Consultant
CoalMIS Project
Shell Coal Pty Ltd
Brisbane, Australia


*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

We do had faced the same problem, with ORAOCI8 stage ( data stage version 3.6) . And, we also tried to conclude that these may be due to too may processes running on the machine. And, reporting the problem to Ardent, got a patch for that. At the same time, we reduced the number of processes triggered on a time and now, the frequency of hanging had reduced, even though not eliminated.

check out with informix now

Muthusamy Karthikeyan
Consultant
Deutsche Bank, singapore
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

This problem has also happened to us in the past and does still happen but less frequently now. We have had it happen with both ODBC (connecting to Oracle & SQL Server) and with Unidata stages as the input, so there does not seem to be any consistency there. Also, it is not repeatable, it happens on random jobs.

I have also contacted support about this issue, but we have found no real resolution. However it is happening less frequently now.

Carey Opitz
Business System Designer
CIGNA Behavioral Health
carey.opitz@cignabehavioral.com

-----Original Message-----
From: David Barham [mailto:David.Barham@Brisbane.Shell-Coal.com.au]
Sent: Thursday, July 27, 2000 12:23 AM
To: informix-datastage@oliver.com
Subject: Transform processes randomly hanging.


I have an intermittent problem.

Some time back, (pre Informix days) I spent a fair bit of time looking into this, with the assistance of Ardent. We never did resolve it, but it became much less frequent, so I have not pursued it since. (More on this later).

The symptoms are as follows:

Within a job, one or more active stages will have a status of starting. It does not matter how long you leave the job, nothing changes. These jobs will abort if I stop them from Director. There is no consistency as to which jobs will do this or which stage(s) of which jobs. I have personally only seen this problem on an active stage whose primary input is from ODBC. Another developer on our team claims to have had this problem where the primary input was ORAOCI8, but I cannot positively confirm this. I can say that it has happened with different ODBC drivers. I have seen it with the Microsoft Visual Foxpro and the Microsoft SQL Server ODBC drivers.

The problem is not predictably repeatable, although seems to only occur when our server is very busy.

What we have figure out:

It would appear that the process for the job attempts to start the processes for each active stage but some of these processes either do not start or die suddenly as soon as they have started. These processes manage to create a work file in &PH& but do not write anything to it.

I originally had the problem when reading from some dBase IV format files using the MS Visual Foxpro ver 5 ODBC driver (note that the usual dBase driver does not handle long file names). Based on suspicions (now why would I think that about MS software) that it was to do with ODBC, I upgraded to version 6 of the Visual Foxpro driver. This dramatically reduced the problem, although it did not go away completely. It has only been happening, say, once per month (which isnt bad as we run about 300 jobs every night).

Unfortunately the problem has been happening a bit more often lately.

* News Flash * Another job has just been discovered hung as I type this. This job is reading from ORAOCI stages (not ORAOCI8). So much for my theory about only being ODBC.

Im not really expecting anyone to solve this problem and as Ardent and I have been over this in detail, Id be surprised (but thankful) if anyone came up with something new that we havent already looked at.

What I really want to know is HAS THIS HAPPENED TO ANYONE ELSE??? Or am I alone with this problem?

David Barham
Information Technology Consultant
CoalMIS Project
Shell Coal Pty Ltd
Brisbane, Australia


*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi David,

Nope. Youre not alone. Usually we have the problem with OCI8 input jobs hanging on completion. Its gone ahead and flushed the various rows out of heap memory and written them all to a hashed file, but the job just wont complete. Sometimes weve had start problems on jobs that do no external processing at all. Theyre just reading/writing with sequential or hashed stages. Our resolution has been to run CleanupJob via Telnet on the guilty party and that usually does the trick. If not, we end up rebooting the server. FYI, were running on NT with DS 3.1.1r3.

Sometimes it appears that there are guilty locks or phantoms "out there" which confuse the job. We were told a while back from Ardent that its a good idea to do a "CLEAR.FILE DATA &PH&" once in a while. Even though jobs have completed fine, there are many entries in the file. Clearing it seems to improve performance.

Muthusamy, you mention a fix you received for this in release 3.6. Could you provide more information on that? Well be going to 3.6 this year and we use OCI8 stages a lot. Thanks.

Brad Vincent
Compuware Corporation
c/o The Detroit Medical Center
Data Warehousing with a "health"-y spin
(313) 966-2176
Locked