Page 1 of 1

Write to dataset failed: Broken pipe

Posted: Mon Feb 19, 2007 7:02 pm
by m8rix
Hello,

I have a sequential file with 2,000,000 + records. For production purposes I made a file that only had 600 records for testing. When I run my parallel job on the shortened file it runs fine, however when I point the sequential file stage back to the original file (2Million records) I get the following error:

I have already over come error after error and this has been going on for days. Now it is time to call in the DS gurus!


EXPIDENTIAL_CALC,2: Failure during execution of operator logic. [api/operator_rep.C:333]
EXPIDENTIAL_CALC,2: Input 0 consumed 13822 records.
EXPIDENTIAL_CALC,2: Output 0 produced 13821 records.
BPBB_CABLE_RAW_CLEAN_csv,0: Write to dataset failed: Broken pipe The error occurred on Orchestrate node Conductor (hostname lxapp0019) [iomgr/iomgr.C:1623]
BPBB_CABLE_RAW_CLEAN_csv,0: Block write failure. Partition: 2 [datamgr/partition.C:1273]
APT_CombinedOperatorController,2: Fatal Error: APT_Decimal::asInteger: the decimal value is out of range for the integer result. [decimal/decimal.f.C:1331]
node_node3a: Player 2 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected exit status 1 [processmgr/slprocess.C:420]
BPBB_CABLE_RAW_CLEAN_csv,0: Failure during execution of operator logic. [api/operator_rep.C:333]
BPBB_CABLE_RAW_CLEAN_csv,0: Output 0 produced 55334 records.
BPBB_CABLE_RAW_CLEAN_csv,0: Fatal Error: Virtual data set.; output of "BPBB_CABLE_RAW_CLEAN_csv": DM getOutputRecord error. [api/dataset_rep1.C:3207]
node_Conductor: Player 1 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected exit status 1 [processmgr/slprocess.C:420]
main_program: Step execution finished with status = FAILED. [sc/sc_api.C:252]
main_program: Startup time, 0:12; production run time, 1:05.
Job RD2_00_Cable_Throughput_ALL aborted.

Posted: Mon Feb 19, 2007 7:13 pm
by ray.wurlod
Welcome aboard. :D

The broken pipe can be between the DataStage processes and the job monitor. To test this theory, add APT_NO_JOBMON environment variable as a job parameter and run the job with it set to True.

Search the forum for an exact match on your error text; you will find that others have had this problem in the past.

Posted: Mon Feb 19, 2007 7:39 pm
by kumar_s
What is the design of the job? Are you doing some kind of Pivot or something?
Check the memory usage and the temp space usage during the job run.
Are you performing any Decimal to Integer conversion?
Try to restrict the flow till 13822 records and check if the jobs completes sucessfully.

Posted: Mon Feb 19, 2007 8:42 pm
by m8rix
ray.wurlod wrote:Welcome aboard. :D

The broken pipe can be between the DataStage processes and the job monitor. To test this theory, add APT_NO_JOBMON environment variable as a job parameter and run the job with ...

I can't see your full post due to not having "Premium" access... Thanks for trying

kumar_s wrote:What is the design of the job?
3 stages [sequential file] => [transformer] => [Oracle DB]

Are you doing some kind of Pivot or something?
No

Check the memory usage and the temp space usage during the job run.
How

Are you performing any Decimal to Integer conversion?
Sort of... In the sequential file, there are figures such as 1.762E8 I had to overcome this by creating derrivation formula:

StringToDecimal(Field(Link.FieldName :"E","E",1)) * StringToDecimal(PadString("1","0",StringToDecimal(Field(Link.FieldName :"E0E","E",2))))

Posted: Mon Feb 19, 2007 8:48 pm
by kumar_s
You log shows that, input stage EXPIDENTIAL_CALC consumed 13822 records.
But BPBB_CABLE_RAW_CLEAN_csv Output produced 55334 records. Though its an virtual dataset, are you letting it to an test sequential file or something.

Since the error came up after 13822th record, check the next comming records have any issues with decimal conversion, which could result in "APT_Decimal::asInteger: the decimal value is out of range for the integer result. [decimal/decimal.f.C:1331] "
Or may be due the high resource contention, after this amount of data the job might have failed.
If you rerun, is it failing at the exactly on the same record processing?