Fifo \pipe error

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Fifo \pipe error

Post by taylor.hermann »

Hi,

I've been dealing with an issue recently that I cant find a ton of causes for. I have a bunch of delta sequential jobs that use about 6 other jobs inside them, all with different invocation ids.

Now I've been load testing these delta sequential jobs currently. Only running around 8 of them at the same time. Where in production there will be upwards of 30+. However about 90% of the time 1 of these 8 jobs will fail. This is due to one of the jobs within the sequence failing. Now the job within the sequence and the sequence that fails is completely random. And I've also ran into this error running a single sequence. But the logs give me the following lines of code:

OPENSEQ '\\.\pipe\Application-RT_SC274-App_Splunk_Message.CUSTOMER_NATL' called: 10:39:36 02 OCT 2014

It repeats this OPENSEQ message about every second for usually exactly 2 minutes. "application" is our project name, and "app_splunk_message.customer_natl" is the job that failed. Not sure what all the other junk is exactly.

Then afterwards it give me the following error:

Error setting up internal communications (fifo \\.\pipe\Application-RT_SC274-App_Splunk_Message.CUSTOMER_NATL) STATUS() 2


The only real resource I found online about this issue is here
http://www-01.ibm.com/support/docview.w ... wg21445893

Our admin has confirmed its not virus scans, and there is plenty of disk space available while these jobs are running. So any more input / ideas is much appreciated!

Thanks,
Taylor
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Could it be that the multiple instances are all (or some of them) trying to access file \\.\pipe\Application-RT_SC274-App_Splunk_Message.CUSTOMER_NATL at the same time? The operating system only allows one writer at a time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

Although I can't see all the content, I can assume what your getting at. But this error has also happened running a single sequential job, so I dont think its a limitation accessing the file. But the chances of getting this error are just a lot less likely when running one job.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Interesting. I was assuming, much like Ray, that this happened when multiple instances were stepping on each other. But if it can happen while the job runs in isolation that's a whole 'nuther kettle of fish.

I believe that STATUS of 2 means "file not found". If you were on a UNIX server I'd suggest making sure your open files limit was high enough but no clue what the equivalent would be for Windows. I'd involve your official support provider on this one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

Yeah, we are currently working on that now too. There's been some talk that our environment may not be setup properly.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Try going through this Technote too:

http://www-01.ibm.com/support/docview.w ... wg21460111
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Was going to link to that one as well but it is so UNIX-centric that I mostly decided to stick with this from the wrap-up paragraph:

"If the above tests do not isolate the cause of file system i/o problem, then it may be necessary to contact Information Server support for assistance in performing a system trace (truss or strace) of the dsapi process launching the failing jobs to track down the actual OS operations which are failing."
-craig

"You can never have too many knives" -- Logan Nine Fingers
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

We worked with an experienced consultant today, and he narrowed it down to a process on our server that was causing this issue. Something called "sh.exe" is randomly breaking and causing this error. Still have yet to determine why its happening.

**As a side note, we worked with IBM before to fix another timeout issue, and their solution was to set "APT_PM_USE_STANDALONE_EXE = 1". This was supposed to be avoiding the shell, and it resolved the immediate issue. **

However we assumed that the sh.exe would not be getting called anymore. But it's getting called somehow.
My question is now, does anyone know a way to completely avoid calling this "sh.exe" process? Or know why when its being called, it randomly breaks jobs?
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

The best way to avoid it is to run DataStage on UNIX. :lol:

Seriously though, I do not know.
Choose a job you love, and you will never have to work a day in your life. - Confucius
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

Well I appreciate the time everyone has took to try and help!
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

For purposes of updating this post with the solution:

We found that a MKS Toolkit file (mkstk.dll) in system32 was showing up as unregistered by windows. And now that we have registered this .dll, this errors have seemed to vanished. IBM told us this was probably because our servers were not hooked up to internet(and still aren't) when Datastage was installed, so this file never got registered.
Post Reply