Hierarchical Stage XML transformation aborts

betterthanever · Post by **betterthanever** » Tue Jan 03, 2017 9:53 am

We have a job that creates Json file using the hierarchical stage for XML transformation. The stage is connected from 3 sequential stages and the XML stage assembly does Restructure and Hjoins. The job is failing after reading 13.5 M records. Is there a setting or environment variable for the stage we need to tweak to process larger volumes? The job needs to process 188 million records. The log is not of much help(rerunning the job with disabling the operator combination), it just fails after partial read of the big sequential file.

I appreciate any input on this.

Failure during execution of operator logic.
Output 0 produced 13573742 records.
node_node2: Player 1 terminated unexpectedly.
main_program: APT_PMsectionLeader(2, node2), player 1 - Unexpected termination by Unix signal 9 (SIGKILL).

SequentialStage---------|

SequentialStage---->XMLTransformation---->OuputFile

SequentialStage---------|

Thanks.

[Note: topic title changed to correctly reflect hierarchical stage usage - Andy]

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Jan 03, 2017 11:14 am

Do any of the latter error messages have java errors in them that refer to stack or heap size issues? If so, you need to increase those settings for the stage (defaults to 256 MB for each).

Also - assuming that you don't need all 10+ million records at once, you can use the Split data into batches option to chop it up into smaller sections for processing. Make sure the data is sorted on a key (even if its an artificial one you had to construct) and you can specify that as the batch split key. This is sort of a "wave-equivalent" that tells the stage to process the data in smaller chunks.

betterthanever · Post by **betterthanever** » Tue Jan 03, 2017 12:03 pm

Thanks for the response.
1. The data that is coming in for restructuring is pre sorted.
2. The heap size has been set to 2.5 GB

Speaking of splitting the data, is there a way for me to find out we are running out of Heap or any other setting that is causing the job to blow up?

Thanks.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Jan 03, 2017 2:06 pm

You would have to look at the latter error messages in the log. This is one of the few cases in DataStage where the first message does not inform you as to the actual cause of the abort.

Look down in the log, see if any of the messages have java errors in them (like "Java runtime exception occurred: java.lang.OutOfMemoryError").

If they do, post the contents here, we'll look at them.

betterthanever · Post by **betterthanever** » Tue Jan 03, 2017 4:58 pm

To make the job fail, I took the files to lower env with a 256MB value for heap and it failed with clear error as you expected, but when I increase the heap and reran it fails with no specific error like before.

betterthanever · Post by **betterthanever** » Wed Jan 04, 2017 7:23 am

I looked down the log and these are additional ones I see.

node_node2: Player 1 terminated unexpectedly.
seq_pfsComp,0: Failure during execution of operator logic.
seq_pfsComp,0: Output 0 produced 9581094 records.
seq_pfsComp,0: Fatal Error: Unable to allocate communication resources
main_program: APT_PMsectionLeader(2, node2), player 1 - Unexpected termination by Unix signal 9(SIGKILL).
seq_JSON_Extract,0: Failure during execution of operator logic.
seq_JSON_Extract,0: Input 0 consumed 0 records.
seq_JSON_Extract,0: Fatal Error: waitForWriteSignal(): Premature EOF on node apsrd3247 Socket operation on non-socket
node_node1: Player 1 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
APT_PMsectionLeader(2, node2), player 2 - Unexpected exit status 1.
main_program: Step execution finished with status = FAILED.

JRodriguez · Post by **JRodriguez** » Wed Jan 04, 2017 9:38 am

If you process small files the job finished OK? Ensure that the process works with smaller files then increase the sizes up to the point where it fail before starting the tuning exercise

I took similar approach while parsing a huge XML file and couldn't tune enough the setting to digest the file...

Could part of the restructure be done outside of the hierarchical stage? Could you join outside too? That should remediate the workload on that part of your process

Regards

betterthanever · Post by **betterthanever** » Wed Jan 04, 2017 12:09 pm

Thanks for your Input. yes the job runs fine till 13.5 M records and it fails after that. even if we bump up the heap it didn't help. We are raising a ticket to IBM and see if they can help.

Speaking of doing the restructure and Hjoins outside of that stage, taking a alternate approach to avoid using the stage and see if we can build the logic in transformer.

johnboy3 · Post by **johnboy3** » Wed Jan 04, 2017 1:35 pm

If you give it a primary key, does that make it no longer be a "heap?"
john3

UCDI · Post by **UCDI** » Wed Jan 04, 2017 3:26 pm

I wouldn't think so. Heap is an older term for the memory a process has. It would only make a difference if adding a key allowed a more efficient memory map, which can happen but "it depends". I would not expect anything to change, but you can try it if you want.

betterthanever · Post by **betterthanever** » Thu Jan 05, 2017 2:55 pm

I tried to implement the same with in a transformer and looks like I am running into same issue. The restructuring I am grouping on has 25k iterations to go though and failing with this error below.
waitForWriteSignal(): Premature EOF on node "nodename" Socket operation on non-socket

JRodriguez · Post by **JRodriguez** » Thu Jan 05, 2017 6:00 pm

See if this technote help
http://www-01.ibm.com/support/docview.w ... wg21503212

wpkalsow · Post by **wpkalsow** » Fri Jan 27, 2017 3:15 pm

When processing several million line items I hit a limit also.

Eventually found a set of optional java arguments that worked.

The final configuration for the hierarchical stage was:

Usage/Java/Heap Size (MB): 1024
Usage/Java/Stack Size (KB): 2048
Usage/Java/Optional Arguments: -Xjit:dontInline={com/ibm/xml/xlxp/api/util/SimplePositionHelper.getCurrentPosition10*,com/ibm/xml/xlxp/api/util/DataBufferHelper.computeCoords10*},{com/ibm/xml/xlxp/api/util/SimplePositionHelper.getCurrentPosition10*}(disableGLU),{com/ibm/xml/xlxp/api/util/DataBufferHelper.computeCoords10*}(disableGLU)
Usage/Scratch Disk: Yes