We have a parallel job which incorporates a Basic Transformer. When running a job there was a failure around the Basic Transformer due to some resources error on the UNIX server. When we restarted the job it kept failing as the Basic Transformer job was left in an invalid state. This is despite us doing a job reset at the start of any job run.
Is there a problem using Basic Transformers in a Parallel Job?
If so then is there a workaround or patch that we can introduce?
Basic Transformer within Parallel Job
Moderators: chulett, rschirm, roy
Basic Transformer within Parallel Job
There are only 10 kinds of people in the world, those that understand binary and those that don't.
The initial error on the Basic Transformer was:
1885 FATAL Thu May 14 22:45:56 2009
BASIC_Transformer_18,0: dspipe_wait(1597): Writer timed out waiting for Reader to connect.
On trying to re-run the job after this failure (job is reset) we got the following on every other attempt:
Event Id: 1995
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: Unable to run job - -2.
Event Id: 1996
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: the runLocally() of operator [DSJobRun in BASIC_Transformer_18], partition 0 of 2, processID 1096 on node1 failed.
The only way to get around this was to rebuild the code from source reseting the status of all jobs.
1885 FATAL Thu May 14 22:45:56 2009
BASIC_Transformer_18,0: dspipe_wait(1597): Writer timed out waiting for Reader to connect.
On trying to re-run the job after this failure (job is reset) we got the following on every other attempt:
Event Id: 1995
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: Unable to run job - -2.
Event Id: 1996
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: the runLocally() of operator [DSJobRun in BASIC_Transformer_18], partition 0 of 2, processID 1096 on node1 failed.
The only way to get around this was to rebuild the code from source reseting the status of all jobs.
There are only 10 kinds of people in the world, those that understand binary and those that don't.
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
priyadarshikunal wrote:whats the value of environment variable DSIPC_OPEN_TIMEOUT? if default 30, try increasing it to 600.
and DS_TDM_PIPE_OPEN_TIMEOUT, by default it should be 720.
Our DSIPC_OPEN_TIMEOUT is set to 30 - this is something that our Admin team will have to tinker with.
There are only 10 kinds of people in the world, those that understand binary and those that don't.