performance issue with reading from hash file via routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
morris.zhang
Participant
Posts: 3
Joined: Sun Jun 15, 2008 5:31 pm

performance issue with reading from hash file via routine

Post by morris.zhang »

Hi All,

I created a routine which reads from a hash file with the hash key, but seems that the performance is bad.

The codes I am using is as following:

vCommand = "SELECT Hash_File_Name WITH @ID = Hash_Key"

Print " Start Time : " : Time()

Execute vCommand capturing output

Print " End Time : " : Time()

It took around 0.012 Seconds to execute the command. I also created a simple job to test the routine and the statistics showed that the performance is 30 rows/sec which is very bad, How can I improve the performance?

Thanks,
Morris
morris
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. How do you define "performance"? What's wrong with 0.012 seconds? How many rows were selected?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... first off, welcome. :D

Secondly, I'd be curious what your 'simple job to test' this looked like. I've got a routine like this, but it's only for doing a singleton select, typically in the Initial Value area of a stage variable. One and Done, as they say.

Unless you've gone out of your way to code for this, each call to the routine needs to open the hashed file. Over and over. Open, select, close. So, again I'm curious - what problem is this meant to solve and are you sure this is the best approach to be taking?
-craig

"You can never have too many knives" -- Logan Nine Fingers
morris.zhang
Participant
Posts: 3
Joined: Sun Jun 15, 2008 5:31 pm

Post by morris.zhang »

Thanks Ray and Craig.

The hashed file has 10 rows only.

The simple test job has two stages only, one transformer creates 10,000 records and keeps calling the routine and writes the output to a sequential file. As Craig said, currently each call to the routine needs to open the hashed file, over and over.

Actually the problem I meant to solve is: lookup a hashed file with hashed_key and insert the hashed_key into the hashed file if lookup fails. so when the next record comes in, it will not fail to lookup.
morris
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ok. Do you realize you don't need any kind of special routine to do this? Simply put two hashed file stages on the canvas, attached to the same transformer, referencing the same hashed filename. One is a reference link, the other an output link. When the lookup fails, write the record to the output link.

The lookup won't fail the next time as long as you've done one of two things:

1) Do not cache the reference lookup.
2) Or used the 'Cached, locked for update' option.

I prefer the former, for whatever that is worth.
-craig

"You can never have too many knives" -- Logan Nine Fingers
morris.zhang
Participant
Posts: 3
Joined: Sun Jun 15, 2008 5:31 pm

Post by morris.zhang »

I've done the same as suggested, it now is working fine, great!

Thanks man.
morris
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No problem. Please mark the thread as Resolved when you get a chance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply