Page 1 of 1
performance issue with reading from hash file via routine
Posted: Wed Jun 18, 2008 6:23 pm
by morris.zhang
Hi All,
I created a routine which reads from a hash file with the hash key, but seems that the performance is bad.
The codes I am using is as following:
vCommand = "SELECT Hash_File_Name WITH @ID = Hash_Key"
Print " Start Time : " : Time()
Execute vCommand capturing output
Print " End Time : " : Time()
It took around 0.012 Seconds to execute the command. I also created a simple job to test the routine and the statistics showed that the performance is 30 rows/sec which is very bad, How can I improve the performance?
Thanks,
Morris
Posted: Wed Jun 18, 2008 6:29 pm
by ray.wurlod
Welcome aboard. How do you define "performance"? What's wrong with 0.012 seconds? How many rows were selected?
Posted: Wed Jun 18, 2008 6:32 pm
by chulett
Well... first off, welcome. :D
Secondly, I'd be curious what your 'simple job to test' this looked like. I've got a routine like this, but it's only for doing a singleton select, typically in the Initial Value area of a stage variable. One and Done, as they say.
Unless you've gone out of your way to code for this, each call to the routine needs to open the hashed file. Over and over. Open, select, close. So, again I'm curious - what problem is this meant to solve and are you sure this is the best approach to be taking?
Posted: Wed Jun 18, 2008 6:46 pm
by morris.zhang
Thanks Ray and Craig.
The hashed file has 10 rows only.
The simple test job has two stages only, one transformer creates 10,000 records and keeps calling the routine and writes the output to a sequential file. As Craig said, currently each call to the routine needs to open the hashed file, over and over.
Actually the problem I meant to solve is: lookup a hashed file with hashed_key and insert the hashed_key into the hashed file if lookup fails. so when the next record comes in, it will not fail to lookup.
Posted: Wed Jun 18, 2008 7:02 pm
by chulett
Ok. Do you realize you don't need any kind of special routine to do this? Simply put two hashed file stages on the canvas, attached to the same transformer, referencing the same hashed filename. One is a reference link, the other an output link. When the lookup fails, write the record to the output link.
The lookup won't fail the next time as long as you've done one of two things:
1) Do not cache the reference lookup.
2) Or used the 'Cached, locked for update' option.
I prefer the former, for whatever that is worth.
Posted: Wed Jun 18, 2008 9:24 pm
by morris.zhang
I've done the same as suggested, it now is working fine, great!
Thanks man.
Posted: Wed Jun 18, 2008 9:44 pm
by chulett
No problem. Please mark the thread as Resolved when you get a chance.