Re: [ALUG] Alternatives to hash and binary chop search - are there any others?

1 Mar 2004


      On Monday 01 March 2004 19:04, Chris Green wrote:
...
On Mon, Mar 01, 2004 at 06:11:20PM +0000, Richard Kettlewell wrote:
...
Chris Green chris@areti.co.uk writes:
...
The existing code uses a rather messy hash look-up and one of the
reasons for the re-write is to make it simpler and more reliable.
A linear search of the table probably isn't a good idea as it would
mean that there would be significant differences in look-up time for
entries at the beginning and end of the table.
I'm leaning towards a binary-chop search but wondered if anyone had
any other ideas.  Binary-chop is easy in principle but it's often
quite difficult to implement in practice because one has to deal with
messy boundary conditions.  It's also a bit more difficult to create
the sorted table necessary automatically and relying on the creator of
the table to sort it might be a bit dodgy.
The table looks up ATM addresses (16 bit values) given an IP address
(32 bit values), maximum size is of the order of 400 entries.
I'm a little puzzled about the hash being 'messy'.  Before throwing out the 
bathwater, make sure the baby isn't in it.  There are several ways of 
implementing hashtables, some of which tend to memory efficiency and others 
to flat-out speed.  You don't say how performance-critical your application 
is so maybe it's simply using an inappropriate technique.
The approach I favour - and have used on 8-bit machines in assembler - is to 
use a small fixed-size table of pointers (typically 64) and a simple, 
arbitrary hashing algorithm that distributes keys in a near-random manner.  
Such as adding up all the characters of the key, multiplying by some prime to 
spread out the results then doing modulo (table size) on the result to index 
into the table.  True randomness is achieved only at a significant cost and 
mostly ain't worth it.
The table contains pointers to linked lists of entries (held outside the 
table) all having the same hashcode.  The first few key hashes will find 
empty cells in the table, but after a while these get used up.  So you just 
set up a link from the previous entry to the next, with a null pointer to 
signify the end of the chain.  With a 64-entry table the performance will be 
getting on for 64 times that of a linear search; quite adequate for moderate 
amounts of data.  I used it for a symbol table in assemblers and compilers 
and it was never the limiting factor on performance, often with far more than 
400 symbols.  Because entries never get removed from a compiler symbol table 
they can all go into a linear memory area, with no need to worry about 
efficient memory reclaim.
-- GT

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Alternatives to hash and binary chop search - are there any others?