System level public hashed file sharing question

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

System level public hashed file sharing question

Post by ArndW »

A while ago at a customer site I tested the system level public hashed file disk caching mechanism, with less than stellar results. The functionality as described in the Ascential DataStage Hash Stage Disk Caching guide (sorry Ray, that's the title...) did work with multiple users sharing common file images and the write-deferred mecahanism allowied speedy file I/O, and it seemed to be a great addition to DS functionality, but ...

I managed, using simple DataStage jobs, to get the files in such a state that they "hung" any process that even tried to open them. None of the documented commands could restore those files to let them be used. In fact all calls to SET.MODE and LIST.FILE.CACHE and CLEAR.FILE.CACHE hung. I couldn't find what locks they were sitting on (certainly nothing visible with LIST.READU or even looking into the shared memory segments) and the only solution was to bring down the DataStage engine. After the third time this happened I decided that this wasn't technology ready for a big 24x7 production environment so stopped testing and didn't pursue that tuning avenue any more.

Now I would very much like to try this mechanism again. I'd love to hear from anyone or everyone who has used public file caching with DS in a production environment.

- Does it work as expected?
- Would you recommend implementing it?
- Any major pitfalls or caveats to watch out for?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

When it came out in 5.0 I was sooo excited. I never got it work right and also experienced the hangings. I have shunned it as well and not revisited it since.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm in the same boat, haven't touched it for years. Curious what comes of all this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Thanks for confirming that my issues weren't just due to me being stupid. I've just managed to get another whole LPAR to play with and am starting to experiment with publich sharing. Unfortunately the limit is 512Mb and this machine has 60Gb of memory for me to use... We have gotten hashed files of just under 2Gb to load to memory for normal reference lookups but we need to imporove write/read performance.

I'll keep this thread posted if anyone is interested.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes please! :D
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might get some more joy from SEMAPHORE.STATUS, but no guarantees.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I've just finished some work on using public cache on a large P595 AIX machine with 12 CPUs with an attached EMC storage array; this system was idle while I tested so the results are more of a single-user test than a real multi-user one with hundreds of users.
Since I am working today and not playing, I need to get permission to post the results here but there are a couple of interesting observations.

In a job that writes to a hashed file, then reads from that file as a reference lookup and also updates the same file, the fastest result by a factor of 2 is when stage write-cache is used in conjunction with loading the reference file to memory over using public caching. This is mainly because public cached files are not loaded to memory for reference lookups.

Basically the only time that the caching surpasses the DataStage job level tuning is when multiple users access the same file for reads and writes. The performance gain of using a publicly cached file for writes and non-memory reads is only about 10% in speed but it does add more than that in system resources.

I am going to try loading the VOC, DS_JOBOBJECTS and DS_JOBS as memory files and see if the response times in the Director get any better for very large projects (~3000 jobs and 50+ categories); since these files are usually opened and used by several processes most of the time.
Post Reply