Surrogate key flat file issue in transformer stage.

arvind_ds · Post by **arvind_ds** » Mon Jan 14, 2013 10:36 am

Hi Experts,

Our datastage architecture is comprised of one coductor node and one compute node.Datastage version is 8.5 FP 1

I am using flat file surrogate key and using it in my parallel job in transformer stage and passing the values through NextSurrogateKey() in the derivation field.

When I run this job with 3 million records, it gets finished in 14 hours.The surrogate key flat file is located on one NFS mounted file system which is visible from both the nodes(conductor node as well as compute node)

Now if I place the copy of the surrogate key file in /tmp folder on both the nodes, the same job gets finished in 1 minute for 3 million records.

Now my question is why the same job is executing with different run times in the above two scenarios?

jwiles · Post by **jwiles** » Mon Jan 14, 2013 1:07 pm

Does your job (and hence the transformer) run only on the compute node, or does it also use the conductor node as a compute node?

What is the block size you are using with the surrogate key?

Regards,

arvind_ds · Post by **arvind_ds** » Tue Jan 15, 2013 9:12 am

I have defined 5 nodes in the configuration file, one conductor node and 4 compute nodes, so I believe the job runs on all the nodes(conductor as well as compute nodes).

I have not defined any block size in the properties of surrogate key tab under transformer stage, it is "System selected block size" radio button enabled only.

jwiles · Post by **jwiles** » Tue Jan 15, 2013 3:21 pm

As your state file is on NAS/NFS, I would recommend specifying a block size rather than using the system-selected option. You could try a value of 10000 or 20000 and adjust higher or lower from there. The goal is to reduce the number of I/O requests to the file (open/close/read/write) over the NFS connection...this is primarily what is slowing down the job. NFS server load can also be a factor if it is also being used for source/target and datasets.

Regards,

arvind_ds · Post by **arvind_ds** » Tue Mar 12, 2013 11:08 am

This is resolved. We have replaced NextSurrogateKey() function in the transformer derivation with below expression and the job finished fine in 10 minutes.

@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM +1 - @NUMPARTITIONS + ps_nxtval

Please note that ps_nxtval is the variable that holds the initial value of the flat file surrogate key.

chulett · Post by **chulett** » Tue Mar 12, 2013 11:31 am

How are you then handling the update of the flat file?