Hi EXperts,
I have 2 questions:-
1. I would like to obtain rationale why looking from hashed created use Use Account Name is much faster than use Directory Path option.
2. I have 8 CPUs on AIX, I don't want to use Link Partition because the data must be processed in sequential order as physically sorted in file. I am using Interprocess but found that only 3-4 CPUs are utilized with only 50% busy. The job is doing the lookup from sequential file to hashed file with 10M records in sequential file and 1M in hashed file, and I got 2-300 rows per sec.
Thank you and Regards,
Lookup from hashed file created with Use Account Name
Moderators: chulett, rschirm, roy
Hi,
1. The main difference is in the actual location of the file;
Account hash files reside in the project's path of the one you supplied as the account name;
Pathed files reside in the supplied path you gave.
So ask your sysadmins what is the difference between the 2 locations to explain the performance difference.
2. Inter Process or IPC brakes a flow to the pipe-line paralelism, meaning it split your 1 process to several smaller peices - each performing a part of the entire logic in sequence and interact via a memory buffer (usually up to 1024KB, default is 128KB).
The reason for low CPU usage thru-out the entire flow usually means your bottle neck is not DS but some other resource like I/O (DB/Disks/Network/...)
IHTH,
1. The main difference is in the actual location of the file;
Account hash files reside in the project's path of the one you supplied as the account name;
Pathed files reside in the supplied path you gave.
So ask your sysadmins what is the difference between the 2 locations to explain the performance difference.
2. Inter Process or IPC brakes a flow to the pipe-line paralelism, meaning it split your 1 process to several smaller peices - each performing a part of the entire logic in sequence and interact via a memory buffer (usually up to 1024KB, default is 128KB).
The reason for low CPU usage thru-out the entire flow usually means your bottle neck is not DS but some other resource like I/O (DB/Disks/Network/...)
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Re: Lookup from hashed file created with Use Account Name
Where did you get this little nugget? AFAIK, the only difference is the physical location of the hash (even though you could 'path' one into your project if you really wanted) and the fact that there is no VOC entry in the Project for a pathed hash file. I've never heard anything about one being 'faster' to access than the other, simply due to this. I suppose an argument could be made about the initial access being 'quicker' with a VOC record but can't imagine this is significant or anything to worry about.chuarkai wrote:I would like to obtain rationale why looking from hashed created use Use Account Name is much faster than use Directory Path option.
Is this documented somewhere?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thank for your suggestion.
I think maybe there are some different between the physical location of hashed file.
For the 2nd question, with the scenario (10M input file with 50 fields with 1M hashed lookup file (with 2 keys and total 5 fields in hashed). If I am running on 8 ways IBM p670 with 12GB ram but the files reside on SAN but physically very close to the p670 server) normally how many % utilization I should get out of these 8 CPUs.
I have checked that DISK wait is very small too and there is no other process running on this server.
I think maybe there are some different between the physical location of hashed file.
For the 2nd question, with the scenario (10M input file with 50 fields with 1M hashed lookup file (with 2 keys and total 5 fields in hashed). If I am running on 8 ways IBM p670 with 12GB ram but the files reside on SAN but physically very close to the p670 server) normally how many % utilization I should get out of these 8 CPUs.
I have checked that DISK wait is very small too and there is no other process running on this server.