The saga continues. After failing to reach any conclusions with IBM Support as to what might be causing 7.5.2 to misbehave on this server and in an effort to isolate the issue, we rolled back to 7.5.1A - a version we use on three other servers with no issues.
Took a full export from another machine and rebuilt the only project it houses. Preliminary tests looked good but longer runs went back to their old tricks of core dumping and generally just falling over dead with no apparent reason until we reset the job. Some of our old friends came back. First visitor:
Code: Select all
Message:
From previous run
DataStage Job 127 Phantom 21476
Program "DSD.StageRun": Line 657, COMMON size mismatch in subroutine "DSP.Close".
Program "DSD.StageRun": Line 657, Unable to load file "DSP.Close".
Program "DSD.StageRun": Line 657, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage DODSLDHFullStg..DPC_LDH
Program "DSD.OnAbort": Line 164, COMMON size mismatch in subroutine "DSP.Close".
Program "DSD.OnAbort": Line 164, Unable to load file "DSP.Close".
Program "DSD.OnAbort": Line 164, Unable to load subroutine.
Two differences from last time. One is
nothing proceeded this (no memory errors) in the log, only the 'abby normal termination' warning from the active stage, then it dropped dead. Second is the passive stage involved is an OCI stage while before it was an XML Output stage. For whatever that is worth.
Rather than stick with the imported executables from another machine, decided to recompile all of the jobs on this server. Next run of this job yielded a slightly different error:
Code: Select all
Message:
From previous run
DataStage Job 127 Phantom 18747
jobnotify: Unknown error
DataStage Phantom Finished.
[18765] DSD.StageRun DODSLDHFullStg. DODSLDHFullStg.xform_1 1 0/5/1 - core dumped.
Message:
From previous run
DataStage Job 127 Phantom 18765
Abnormal termination of DataStage.
Fault type is 10. Layer type is BASIC run machine.
Fault occurred in BASIC program DSD.Update at address 1d2
This again after just falling over dead during the job run with no other messages in the log.
I then realized that with this new / clean install I hadn't made our 'standard' tweaks to the uvconfig file. While I didn't think it was related to our issue, figured we needed to get that out of the way. Dropped the server, tweaked, regen'd and brought it back up. Next run, right back where we started:
Code: Select all
Message:
From previous run
DataStage Job 127 Phantom 2287
Program "DSD.StageRun": Line 657, COMMON size mismatch in subroutine "DSP.Close".
Program "DSD.StageRun": Line 657, Unable to load file "DSP.Close".
Program "DSD.StageRun": Line 657, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage DODSLDHFullStg..DPC_LDH
Program "DSD.OnAbort": Line 164, COMMON size mismatch in subroutine "DSP.Close".
Program "DSD.OnAbort": Line 164, Unable to load file "DSP.Close".
Program "DSD.OnAbort": Line 164, Unable to load subroutine.
I'm really at a loss as to what to do next. Other than report this back to IBM, who still has our original case with 7.5.2 open, and try to get our SAs and HP involved. And I'm not really sure why I'm posting all this here, perhaps just a vain hope for a miracle cure from the group here, perhaps seeking a little guru technobabble...
"The positronic field generated by the interdimensional transformer reflex system is causing a pulsary overload in our aggregator's warp transducer, captain."
Or maybe it's just that misery loves company.
