Lookup from hashed file created with Use Account Name

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chuarkai
Participant
Posts: 10
Joined: Mon Apr 28, 2003 6:43 am

Lookup from hashed file created with Use Account Name

Post by chuarkai »

Hi EXperts,

I have 2 questions:-

1. I would like to obtain rationale why looking from hashed created use Use Account Name is much faster than use Directory Path option.

2. I have 8 CPUs on AIX, I don't want to use Link Partition because the data must be processed in sequential order as physically sorted in file. I am using Interprocess but found that only 3-4 CPUs are utilized with only 50% busy. The job is doing the lookup from sequential file to hashed file with 10M records in sequential file and 1M in hashed file, and I got 2-300 rows per sec.

Thank you and Regards,
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
1. The main difference is in the actual location of the file;
Account hash files reside in the project's path of the one you supplied as the account name;
Pathed files reside in the supplied path you gave.
So ask your sysadmins what is the difference between the 2 locations to explain the performance difference.

2. Inter Process or IPC brakes a flow to the pipe-line paralelism, meaning it split your 1 process to several smaller peices - each performing a part of the entire logic in sequence and interact via a memory buffer (usually up to 1024KB, default is 128KB).
The reason for low CPU usage thru-out the entire flow usually means your bottle neck is not DS but some other resource like I/O (DB/Disks/Network/...)

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Lookup from hashed file created with Use Account Name

Post by chulett »

chuarkai wrote:I would like to obtain rationale why looking from hashed created use Use Account Name is much faster than use Directory Path option.
Where did you get this little nugget? :? AFAIK, the only difference is the physical location of the hash (even though you could 'path' one into your project if you really wanted) and the fact that there is no VOC entry in the Project for a pathed hash file. I've never heard anything about one being 'faster' to access than the other, simply due to this. I suppose an argument could be made about the initial access being 'quicker' with a VOC record but can't imagine this is significant or anything to worry about.

Is this documented somewhere?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chuarkai
Participant
Posts: 10
Joined: Mon Apr 28, 2003 6:43 am

Post by chuarkai »

Thank for your suggestion.

I think maybe there are some different between the physical location of hashed file.

For the 2nd question, with the scenario (10M input file with 50 fields with 1M hashed lookup file (with 2 keys and total 5 fields in hashed). If I am running on 8 ways IBM p670 with 12GB ram but the files reside on SAN but physically very close to the p670 server) normally how many % utilization I should get out of these 8 CPUs.

I have checked that DISK wait is very small too and there is no other process running on this server.
Post Reply