the job suspended after insert 150 to 250 thousands rows, suspend means stage stop inserting new rows and it doesn't return any error or warning messages either. it doesn't happen every day but 1 or 2 times a week
the properties of oracle stage is:
I talked to SA and DBAs to confirm there is no network problems between servers and DBAs couldn't see problem either. can anyone help?
yep it is parallel (auto partition), i was thinking parallel could be the problem.
you mean change the whole job to server job? then the dataset wouldn't work then, so change the dataset to sequential file?
No, he means set the stage property so it runs sequentially in your parallel job, i.e. one instance only. That or run on a single node.
IMHO a better solution would be to understand why it is 'suspended' - deadlocks? Blocking locks? And then change from 'auto' to one where you control the partitioning, perhaps hash or modulus on the PK value(s) involved, that or even round robin.
I don't recall what 'auto' equates to here but I'm a more of a fan of explicit control over it. And I'm sure Ray will be along to enlighten us.
-craig
"You can never have too many knives" -- Logan Nine Fingers
yes chulett. i have talked to DBA this morning, he can't find any deadlock, blocking locks or whatsoever on database and that's why i felt a bit helpless, the only thing he could find was there were 2 insertion sql processes on this table it may cause the problem and this is confirmed with SURA's opinion.
my solution for now is:
change the config file, let the job run single node
change the partition strategy to hash partition. (maybe oracle connector type?)
ok, i created new config file and make first node to ORACLE dedicated node, so oracle stage should only use that node for the job. let's see what happens......
grrr, the problem happened again
now the log shows oracle connector will run in parallel on 1 processing nodes. but still shows 2 connections on database side......
Best to have that conversation with your DBA. Basically, a record is locked by another process, 'blocking' your process's ability to lock it before the update. I've seen it happen by something as simple as someone browsing the record in Toad and clicking on it in the Schema Brower data window which locks it. Person exits Toad and bam, your suspended process completes.
-craig
"You can never have too many knives" -- Logan Nine Fingers
If the job runs ok with the Oracle Upsert statement set to sequential, then most likely you have duplicate records affecting the deadlock.
Resolving the problem
Ensure you have all duplicate records in the same partition, so that the multiple Oracle update process do not try to access (lock) the same block at the same time. Use hash partitioning and hash on the columns used by the Oracle "where clause".