Where to place variation on DYNAMIC hash

datastage · Post by **datastage** » Wed Aug 04, 2004 7:00 am

I want to create a dynamic hashed file, probably with SEQ.NUM algorithm. I have four columns for the key: GEO_COUNTRY, GEO_STATE, GEO_COUNTY, GEO_CITY. GEO_STATE is character based, the other 3 are integers. The only values for GEO_COUNTRY are 1 and 2. Obviously you can guess the likely proportions of variance for GEO_COUNTY and GEO_CITY.

First, is it safe to assume the best read/write performance would come with placing GEO_STATE as the last column in the key and use the SEQ.NUM algorithm over the GENERAL algorithm?

Second, with a dynamic hashed file, is it better to have less variance at the beginning of the key (GEO_COUNTRY, then GEO_COUNTY, then GEO_CITY) or vice versa so the grouping has more choices since I would assume with only 2 values for GEO_COUNTRY it will have to look at the next field every time? Or for the dynamic hashed file does the placement of the variance have no impact at all?

I guess it won't be too hard to make these changes and test myself, but I'm curious as to what people think 'theory' should dictate.

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Aug 04, 2004 9:49 pm

You would think that leftmost variance would suit GENERAL best while rightmost variance (like an odometer) would suit SEQ.NUM best. It turns out that the hashing algorithm for dynamic hashed files performs a bit-rotate operation after each character processed, to achieve better "randomness", so it doesn't really make a great deal of difference - in dynamic hashed files - where the most variation lies.

Of course, it's a completely different story with static hashed files; the discussion of which in the manuals is improving as releases go by. The main gotcha with static hashed files' hashing algorithms is - for types 2 through 12 - the very small number of characters of the key that are considered - only four or five in some cases! But, hey, it works as documented (and as it has since the mid 1980's), so you have to live with it.

(The hashing algorithms used to be taught in the UniVerse Internals class, until IBM took ownership and decided that that information was too sensitive!)

mleroux · Post by **mleroux** » Thu Aug 05, 2004 5:36 am

I have a table from Ascential that defines what types of static hash files one can use, depending on where in the key the most variation occurs. Even though it probably won't be too useful in this specific case, it's quite interesting:

Code: Select all

+------------+-----------------------------------+
| char type  |  most variation in key occurs in: |
+------------+--------+--------+--------+--------+
|            | right  | middle |  left  |  any   |
+------------+--------+--------+--------+--------+
| wholly     |   2    |   6    |   10   |   14   |
| numeric    |        |        |        |        |
+------------+--------+--------+--------+--------+
| numeric &  |   3    |   7    |   11   |   15   |
| separators |        |        |        |        |
+------------+--------+--------+--------+--------+
| ASCII      |   4    |   8    |   12   |   16   |
| chars      |        |        |        |        |
+------------+--------+--------+--------+--------+
| Any        |   5    |   9    |   13   |   17   |
| type       |        |        |        |        |
+------------+--------+--------+--------+--------+

Wow! Haven't done a table with text character in a looong time. I would actually like to see how these hashing algorithms work... does anybody have any info?

datastage · Post by **datastage** » Thu Aug 05, 2004 6:19 am

Actually Fitzgerald and Long have some good information http://www.fitzlong.com on their site. These two must love hashed files more than anyone else. Its focused more at static hashed files and users of UniVerse, put click the support button and then technical papers link for some pretty intense documentation on hashed files.

ray.wurlod · Post by **ray.wurlod** » Thu Aug 05, 2004 9:49 pm

There is, to my mind, a better table at the end of Chapter 2 in the Server Job Developer's Guide (version 6.x and later), in which the number of key characters considered is also given.

DSXchange

Where to place variation on DYNAMIC hash

Where to place variation on DYNAMIC hash

Static hashed file hashing algorithms (= file type)