Job takes 30 minutes to finish

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Job takes 30 minutes to finish

Post by boxtoby »

Hi Guys,

I have had a couple of jobs which have finished processing, but won't finish.

The jobs are quite simple, there are no look ups, joins, sorts or aggregations.

The jobs write a large number of records (58m) to oracle and sql server tables and I can see that the records have been committed to the database, it's the same number of rows in each table. I can confirm that there no oracle processing running and I as far as I can tell, there are no sql server processes running either.

What is happening is that in the job log I can see that the last "Number of rows inserted" message occurs 15 minutes after the previous message and 15 minutes after that I get an oracle disconnection message:

28/11/2014 14:20:11 ora_out_ftress_audit_log,3: Number of rows inserted on the current node: 7308136.
28/11/2014 14:20:11 ora_out_ftress_audit_log,4: Number of rows inserted on the current node: 7308136.
28/11/2014 14:35:38 ora_out_ftress_audit_log,6: Number of rows inserted on the current node: 7308136.

28/11/2014 14:51:08 ora_in_fterss_audit_log: The OCI function OCISessionEnd returned status -1. Error code: 3,113, Error message: ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 403 Serial number: 14417.

28/11/2014 14:51:08 ora_out_ftress_audit_log: The OCI function OCISessionEnd returned status -1. Error code: 3,113, Error message: ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 98 Serial number: 41799.

Does anyone know why the job is taking 30 minutes to finish?

Is this normal on this number of rows?

Many thanks,
Bob.
Bob Oxtoby
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I am not sure how normal those things might be.

What are your versions of DataStage, Oracle server, and Oracle client (if using)?

What stage type are you using to connect to Oracle?

Does the source or target stage have any before-SQL or after-SQL statements?

If you select count on the target table while the job runs, at what point in time does the count show all the new records were inserted?
Choose a job you love, and you will never have to work a day in your life. - Confucius
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi Eric,

We are on DS 9.2, oracle 11g

Connecting to oracle with the oracle connector stage

There are no before or after sql statements

We were watching the job for some time and were able to see that there were no oracle tasks running from the database consul very early on in the 30 minute window.

Thanks,
Bob.
Bob Oxtoby
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Those are supported versions.

Is the same job writing to Oracle and SQL Server at the same time or are they separate jobs?

Are there any other targets in the job design that could be causing a wait?

Just wondering what more could be done to narrow it down, such as cutting the job in half or eliminating all other stages in a test copy of the job...

Here are a few links. The first suggests not to put too much trust into the timestamps on the job log entries.

DataStage Director job log does not correctly report the time when the Oracle connector generates some messages

Performance monitoring
Choose a job you love, and you will never have to work a day in your life. - Confucius
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Timestamps in log files are misleading.

The timestamp you see for the number of rows inserted is the time of the log entry being updated. Not the time of the rows being inserted. Think of it as a print buffer. There was still information that had yet to be printed, the next event came in and cause the message to be printed. So that is why you see a 15 min delay on your ora_out_ftress_audit_log statement. Now as to why your job took 20 more minutes after that to finish... do you have an after job routine coded into it?

Do you have an after sql statement coded into the connector?
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

On which flavor of Unix is DataStage installed in your shop? What is the "write" mode in Oracle connector stage?

This is one of the issues, that we are facing on DataStage 9.1.2 (Solaris) while writing to Oracle 11.2.0.3 on three Node RAC. (APT Config = 8 Nodes)

Another related issue is that about once a week (randomly), one of the jobs writing in "Update/Insert" mode rejected records with "ORA00001: Unique Constraint error". The job runs fine on restarting.

Our support provider could not replicate either of these issues in their environment and believe that these are an Oracle database issue. There was also some discussion about Oracle bug 10178982(which as per documentation is fixed in Oracle 11.2.0.3).
Suggested workaround for these issues is to either run on a 1 node configuration or set the target Oracle Connector stage to process sequential.

Please note that most of the jobs with these issues have write mode "Update/Insert" or "Insert/Update".
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi All,

Thank you for all your various contributions.

I now have the same problem on a different job, except that error 3113 caused the job to abort. The job is very simple, read an oracle table and write the data in to an sql server table. No transformations, look ups, before or after sql anywhere.

The job is processing about 200m rows and has the following settings:

Oracle source stage:
record count = 200000
array size = 200000

Sql Server stage:
BULK LOAD turned on
BulkLoadBatchSize=8192
record count = 400000
array size = 400000

As with the previous example I believe the job had finished writing to sql server (certainly taking a while to drop and create the table!) and while the job was "finishing" Oracle timed out.

Having said all that, I was wondering if setting "Manage Application Failover = Y" option keep the oracle connection going while the job finished or have I misunderstood the purpose of that option?

Many thanks,
Bob.
Bob Oxtoby
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

The Oracle Connector stage properties are fairly well documented in the Connectivity Guide for Oracle Databases (i46decor.pdf).

If database connection is enabled for transparent application failover (TAF), the application that is connected to the database (i.e. your job) is transparently reconnected to an alternative database instance if the original connection fails.

There may be other, more relevant settings, such as Reconnect, Disconnect, and Inactivity period.

I would be curious what happens if you were to split the job into 1) source stage --> data set and 2) data set --> target stage.

If that doesn't narrow down or resolve the problem, open a Support case to help get to the bottom of it.
Choose a job you love, and you will never have to work a day in your life. - Confucius
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi Eric,

"Manage Application Failover = Y" worked! Or at least the job didn't timeout.

Reconnect and disconnect are only available when the stage is used as a target.

Didn't try splitting the job, but an useful option.

Thanks,
Bob.
Bob Oxtoby
Post Reply