Need help creating optimal Static Hashed File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Need help creating optimal Static Hashed File

Post by gateleys »

I need to create a Static Hashed File that contains the following columns-

Total number of rows = 10,000,000 (10 million)
SomeDateAndTime KEY, TIMESTAMP(19) eg. 2010-05-12 09:30:45
FirstAmount Non-Key, INTEGER(7) eg. 1234567
SecondAmount Non-Key, INTEGER(7) eg. 1234567

I am using the HFC calculator to generate the mkdbfile command for the hashed file creation. My parameters are as follows-

Static Hashed File Type 2
Row size = 19 bytes (key col)+ 7 bytes (first amount) + 7 bytes (second amount)+ 2 bytes (for two delimiters) + 2 bytes (for CRLF) + 14 bytes (for overhead) = 51 bytes
Total number of rows = 10,000,000


Baed on the above, the HFC.exe spits out the following command -

Code: Select all

mkdbfile testStaticHFile 2 1250003 1 -32BIT
I, see that the file thus created is about 640 MB. However, when I populate the file, the size is about 1.1 GB.

1. Am I computing my row size incorrectly?
2. Should I be using some other Type (and not Type 2)?
3. Similar static hashed file with about half the number of rows gave a decent lookup performance. However, with the new file, the performance has been painfully slow. Any ideas?


Thanks
gateleys
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: Need help creating optimal Static Hashed File

Post by gateleys »

BTW, I have left the Separation to default of 1.
gateleys
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Type 2 isn't really that appropriate for timestamp. Try type 18, or perhaps Type 5.

At 51 bytes per record, separation 1 is fine, but separation 2 or 4 may be more efficient because of matching the physical page size used in your hardware.

Finally, try choosing a prime number as the modulo. The hfc.exe program has this as an option.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply