Peek stage and OCI stage

dsscholar · Post by **dsscholar** » Wed Sep 21, 2011 12:42 pm

Hi all,

1) Source ---> transformer -----> peek stage. If records coming out of transformer is 1000, all the 1000 records will be seen in the job log(output option = job log) or if i set in peek stage as 10 records per partition, only first 40 records will be visible in case if it runs in 4 nodes?

2) And whats the significance of " insert array size" property in OCI stage. Does it related to the commit interval of records ?

3) In case if i use OCI stage as target and Peek stage to collect reject records in that case if 100 records fail for unique constraint the job will abort once after the check is done for all the hundred records or once the check for first 40 records is done, the job will abort and only 40 records will be visible in job log?

Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Wed Sep 21, 2011 3:56 pm

1) Correct.
2) Array size has nothing to do with commit - it's simply the number of rows transmitted at a time. Commit rowcount should be a whole multiple of array size for best performance.
3) Correct.

qt_ky · Post by **qt_ky** » Wed Sep 21, 2011 9:25 pm

3) I did not follow you on that one. If 100 records fail to insert, and peek stage is set to 10 with 4 nodes, I think you would get 4 peek entries in the job log with 10 rows in each, and you would not see the other 60 rejected rows. If each insert failure is logged as a warning and the job aborts after 50 warnings, then I'm not sure what you end up with.

I have seen where if I set the Peek value to 100 rows then in the job log for a given node, I may get two Peek entries, one giving say 30 rows and the other entry giving 70.

ray.wurlod · Post by **ray.wurlod** » Thu Sep 22, 2011 12:55 am

I didn't claim they're all in the same Peek entry. Information is logged as it arrives, and multiple nodes' information tends to collide. But if you look at all the Peek entries for node 0, for example, you'll find a total of 10 rows (or whatever you set it to). Ditto for node 1, and so on.

dsscholar · Post by **dsscholar** » Fri Sep 23, 2011 1:48 pm

Does performance improve if i set the array size to total records from the source as it completes transfers all the records to the target at a time. Or my assumption is wrong. It will execute parallely or in bulk mode when i select array size option.

Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Fri Sep 23, 2011 3:58 pm

Setting array size to 0 logically transfers all records at once, but the network still breaks that mass of data into packets so any gain would be insignificant.

Array size has nothing at all to do with execution mode or read mode.

dsscholar · Post by **dsscholar** » Sat Sep 24, 2011 8:22 am

U mean to say, there is no benefit in using array size option?

Whats the difference in setting it to 10 rows at a time and complete rows at a time then?

chulett · Post by **chulett** » Sat Sep 24, 2011 9:23 am

dsscholar wrote:U mean to say, there is no benefit in using array size option?

U didn't say any such thing, nor did Ray. Keep in mind the fact that "U" is one of our posters.

As noted, it strictly controls how many rows are sent across the network 'in an array', nothing more and nothing less. There are times when you'll want it to be set to 1 and others when it should be larger. There is always a sweet spot where increasing the size starts to decrease the throughput and performance starts to drop. There are many factors at work there as noted - network packet size and ARL being two of the most important. There's a lot more to it than simple sending everything at once.

Note this as nothing to do with DataStage but how Oracle works. A quick google for "Oracle array size" should help enlighten you on the topic.