Basic Transformer within Parallel Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Basic Transformer within Parallel Job

Post by palmeal »

We have a parallel job which incorporates a Basic Transformer. When running a job there was a failure around the Basic Transformer due to some resources error on the UNIX server. When we restarted the job it kept failing as the Basic Transformer job was left in an invalid state. This is despite us doing a job reset at the start of any job run.
Is there a problem using Basic Transformers in a Parallel Job?
If so then is there a workaround or patch that we can introduce?
There are only 10 kinds of people in the world, those that understand binary and those that don't.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

For us when we used a basic transformer in a parallel job has caused a mutation error when its processing huge data .
Nag
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps if you posted the actual errors you are seeing it would help. There's no fundamental problem, per se, other than all the normal reasons to not use one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

The initial error on the Basic Transformer was:

1885 FATAL Thu May 14 22:45:56 2009
BASIC_Transformer_18,0: dspipe_wait(1597): Writer timed out waiting for Reader to connect.

On trying to re-run the job after this failure (job is reset) we got the following on every other attempt:

Event Id: 1995
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: Unable to run job - -2.
Event Id: 1996
Time : Fri May 15 03:21:26 2009
Type : FATAL
User : dsadm
Message :
BASIC_Transformer_18,0: the runLocally() of operator [DSJobRun in BASIC_Transformer_18], partition 0 of 2, processID 1096 on node1 failed.

The only way to get around this was to rebuild the code from source reseting the status of all jobs.
There are only 10 kinds of people in the world, those that understand binary and those that don't.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

whats the value of environment variable DSIPC_OPEN_TIMEOUT? if default 30, try increasing it to 600.

and DS_TDM_PIPE_OPEN_TIMEOUT, by default it should be 720.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is node1 a different machine from the conductor node?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

priyadarshikunal wrote:whats the value of environment variable DSIPC_OPEN_TIMEOUT? if default 30, try increasing it to 600.

and DS_TDM_PIPE_OPEN_TIMEOUT, by default it should be 720.

Our DSIPC_OPEN_TIMEOUT is set to 30 - this is something that our Admin team will have to tinker with.
There are only 10 kinds of people in the world, those that understand binary and those that don't.
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

ray.wurlod wrote:Is node1 a different machine from the conductor node? ...
We only have one server available to us.
There are only 10 kinds of people in the world, those that understand binary and those that don't.
Post Reply