HFC hashed file calculations

clarcombe · Post by **clarcombe** » Thu Dec 06, 2012 3:19 am

I am performing some optimisation on some jobs and noticed that by far the best performance comes from creating the HF (type 2) before running the job .

Using the parameters generated in the HFC.exe, in development I know how many lines will be in each hashed file. When this runs in production I won't know, so I have to "guess" what the best settings for the hashed files will be or make the files super big to accomodate any errors.

For example an average row size of 40 for 8.4m lines gives me
2 229693 1 32BIT

But when I run in production, this could be more or less.

Question
What is the relationship between the av size (40) and number of rows (8.4m) and the value 229693. Is there anyway I can write a routine to calculate this.

Thanks

roy · Post by **roy** » Thu Dec 06, 2012 8:06 am

Hi,
Generally the most significant , performance wise, impact is the number of groups built as you create the file.
The number og groups required depends on the number of records your about to process divided by the number of rows that fits in one group (the hashed file is built from) the size of each hashed file group can be 2k or 4k (group size 1 or 2)
So having done that calculation you can set the group count properly.

In my experiance, once using a disk storage machine instead of local disks, there is no real benefit to using statis hashed files over dynamic ones, but maybe others have a different experiance.

you can get an estimated starting point if there is a real working process
or get an estimate that you will monitor and change if needs be.

IHTH (I Hope This Helps),

clarcombe · Post by **clarcombe** » Thu Dec 06, 2012 8:25 am

Hi Roy,

I am trying to understand what you mean by groups.

The three fields returned by the HFC are
FileType
Modulo
Separation

Is the separation what you mean by a group ?

As the separation and file type will remain static, what I need to (roughly) calculate is the modulo

How can I achieve this? What is it a function of ?

As for the disk storage, we are not that advanced here (yet!), we still used local Windows disks. If you have any recommendations for alternative disk storage I am all ears.

Using static HFs against dynamic, I am doubling the throughput time.

Thanks

chulett · Post by **chulett** » Thu Dec 06, 2012 9:19 am

Colin - This post by our friend Ken Bland may help.

ray.wurlod · Post by **ray.wurlod** » Thu Dec 06, 2012 2:20 pm

I can't recall whether the Help in HFC provides the algorithm for modulo. Replying to Colin by PM.

clarcombe · Post by **clarcombe** » Fri Dec 07, 2012 5:19 am

It did Craig, in as much as I saw that Ray wrote the original HFC program. So I sent him a mail how the calculations are arrived at.

I am almost there!