Surrogate key flat file issue in transformer stage.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Surrogate key flat file issue in transformer stage.

Post by arvind_ds »

Hi Experts,

Our datastage architecture is comprised of one coductor node and one compute node.Datastage version is 8.5 FP 1

I am using flat file surrogate key and using it in my parallel job in transformer stage and passing the values through NextSurrogateKey() in the derivation field.

When I run this job with 3 million records, it gets finished in 14 hours.The surrogate key flat file is located on one NFS mounted file system which is visible from both the nodes(conductor node as well as compute node)

Now if I place the copy of the surrogate key file in /tmp folder on both the nodes, the same job gets finished in 1 minute for 3 million records.

Now my question is why the same job is executing with different run times in the above two scenarios?
Arvind
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Does your job (and hence the transformer) run only on the compute node, or does it also use the conductor node as a compute node?

What is the block size you are using with the surrogate key?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

I have defined 5 nodes in the configuration file, one conductor node and 4 compute nodes, so I believe the job runs on all the nodes(conductor as well as compute nodes).

I have not defined any block size in the properties of surrogate key tab under transformer stage, it is "System selected block size" radio button enabled only.
Arvind
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

As your state file is on NAS/NFS, I would recommend specifying a block size rather than using the system-selected option. You could try a value of 10000 or 20000 and adjust higher or lower from there. The goal is to reduce the number of I/O requests to the file (open/close/read/write) over the NFS connection...this is primarily what is slowing down the job. NFS server load can also be a factor if it is also being used for source/target and datasets.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

This is resolved. We have replaced NextSurrogateKey() function in the transformer derivation with below expression and the job finished fine in 10 minutes.

@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM +1 - @NUMPARTITIONS + ps_nxtval

Please note that ps_nxtval is the variable that holds the initial value of the flat file surrogate key.
Arvind
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How are you then handling the update of the flat file?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply