Peek stage and OCI stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Peek stage and OCI stage

Post by dsscholar »

Hi all,

1) Source ---> transformer -----> peek stage. If records coming out of transformer is 1000, all the 1000 records will be seen in the job log(output option = job log) or if i set in peek stage as 10 records per partition, only first 40 records will be visible in case if it runs in 4 nodes?

2) And whats the significance of " insert array size" property in OCI stage. Does it related to the commit interval of records ?

3) In case if i use OCI stage as target and Peek stage to collect reject records in that case if 100 records fail for unique constraint the job will abort once after the check is done for all the hundred records or once the check for first 40 records is done, the job will abort and only 40 records will be visible in job log?



Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1) Correct.
2) Array size has nothing to do with commit - it's simply the number of rows transmitted at a time. Commit rowcount should be a whole multiple of array size for best performance.
3) Correct.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

3) I did not follow you on that one. If 100 records fail to insert, and peek stage is set to 10 with 4 nodes, I think you would get 4 peek entries in the job log with 10 rows in each, and you would not see the other 60 rejected rows. If each insert failure is logged as a warning and the job aborts after 50 warnings, then I'm not sure what you end up with.

I have seen where if I set the Peek value to 100 rows then in the job log for a given node, I may get two Peek entries, one giving say 30 rows and the other entry giving 70.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I didn't claim they're all in the same Peek entry. Information is logged as it arrives, and multiple nodes' information tends to collide. But if you look at all the Peek entries for node 0, for example, you'll find a total of 10 rows (or whatever you set it to). Ditto for node 1, and so on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Does performance improve if i set the array size to total records from the source as it completes transfers all the records to the target at a time. Or my assumption is wrong. It will execute parallely or in bulk mode when i select array size option.


Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Setting array size to 0 logically transfers all records at once, but the network still breaks that mass of data into packets so any gain would be insignificant.

Array size has nothing at all to do with execution mode or read mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

U mean to say, there is no benefit in using array size option?

Whats the difference in setting it to 10 rows at a time and complete rows at a time then?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

dsscholar wrote:U mean to say, there is no benefit in using array size option?
U didn't say any such thing, nor did Ray. Keep in mind the fact that "U" is one of our posters.

As noted, it strictly controls how many rows are sent across the network 'in an array', nothing more and nothing less. There are times when you'll want it to be set to 1 and others when it should be larger. There is always a sweet spot where increasing the size starts to decrease the throughput and performance starts to drop. There are many factors at work there as noted - network packet size and ARL being two of the most important. There's a lot more to it than simple sending everything at once.

Note this as nothing to do with DataStage but how Oracle works. A quick google for "Oracle array size" should help enlighten you on the topic.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply