Where to place variation on DYNAMIC hash

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Where to place variation on DYNAMIC hash

Post by datastage »

I want to create a dynamic hashed file, probably with SEQ.NUM algorithm. I have four columns for the key: GEO_COUNTRY, GEO_STATE, GEO_COUNTY, GEO_CITY. GEO_STATE is character based, the other 3 are integers. The only values for GEO_COUNTRY are 1 and 2. Obviously you can guess the likely proportions of variance for GEO_COUNTY and GEO_CITY.

First, is it safe to assume the best read/write performance would come with placing GEO_STATE as the last column in the key and use the SEQ.NUM algorithm over the GENERAL algorithm?

Second, with a dynamic hashed file, is it better to have less variance at the beginning of the key (GEO_COUNTRY, then GEO_COUNTY, then GEO_CITY) or vice versa so the grouping has more choices since I would assume with only 2 values for GEO_COUNTRY it will have to look at the next field every time? Or for the dynamic hashed file does the placement of the variance have no impact at all?

I guess it won't be too hard to make these changes and test myself, but I'm curious as to what people think 'theory' should dictate.

Thanks
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You would think that leftmost variance would suit GENERAL best while rightmost variance (like an odometer) would suit SEQ.NUM best. It turns out that the hashing algorithm for dynamic hashed files performs a bit-rotate operation after each character processed, to achieve better "randomness", so it doesn't really make a great deal of difference - in dynamic hashed files - where the most variation lies.

Of course, it's a completely different story with static hashed files; the discussion of which in the manuals is improving as releases go by. The main gotcha with static hashed files' hashing algorithms is - for types 2 through 12 - the very small number of characters of the key that are considered - only four or five in some cases! But, hey, it works as documented (and as it has since the mid 1980's), so you have to live with it.

(The hashing algorithms used to be taught in the UniVerse Internals class, until IBM took ownership and decided that that information was too sensitive!)
Last edited by ray.wurlod on Thu Aug 05, 2004 9:43 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mleroux
Participant
Posts: 81
Joined: Wed Jul 14, 2004 3:18 am
Location: Johannesburg, South Africa
Contact:

Post by mleroux »

I have a table from Ascential that defines what types of static hash files one can use, depending on where in the key the most variation occurs. Even though it probably won't be too useful in this specific case, it's quite interesting:

Code: Select all

+------------+-----------------------------------+
| char type  |  most variation in key occurs in: |
+------------+--------+--------+--------+--------+
|            | right  | middle |  left  |  any   |
+------------+--------+--------+--------+--------+
| wholly     |   2    |   6    |   10   |   14   |
| numeric    |        |        |        |        |
+------------+--------+--------+--------+--------+
| numeric &  |   3    |   7    |   11   |   15   |
| separators |        |        |        |        |
+------------+--------+--------+--------+--------+
| ASCII      |   4    |   8    |   12   |   16   |
| chars      |        |        |        |        |
+------------+--------+--------+--------+--------+
| Any        |   5    |   9    |   13   |   17   |
| type       |        |        |        |        |
+------------+--------+--------+--------+--------+
Wow! Haven't done a table with text character in a looong time. I would actually like to see how these hashing algorithms work... does anybody have any info?
Morney le Roux

There are only 10 kinds of people: Those who understand binary and those who don't.
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

Actually Fitzgerald and Long have some good information http://www.fitzlong.com on their site. These two must love hashed files more than anyone else. Its focused more at static hashed files and users of UniVerse, put click the support button and then technical papers link for some pretty intense documentation on hashed files.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Static hashed file hashing algorithms (= file type)

Post by ray.wurlod »

There is, to my mind, a better table at the end of Chapter 2 in the Server Job Developer's Guide (version 6.x and later), in which the number of key characters considered is also given.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply