On Monday 01 March 2004 19:04, Chris Green wrote:
On Mon, Mar 01, 2004 at 06:11:20PM +0000, Richard Kettlewell wrote:
Chris Green chris@areti.co.uk writes:
The existing code uses a rather messy hash look-up and one of the reasons for the re-write is to make it simpler and more reliable.
A linear search of the table probably isn't a good idea as it would mean that there would be significant differences in look-up time for entries at the beginning and end of the table.
I'm leaning towards a binary-chop search but wondered if anyone had any other ideas. Binary-chop is easy in principle but it's often quite difficult to implement in practice because one has to deal with messy boundary conditions. It's also a bit more difficult to create the sorted table necessary automatically and relying on the creator of the table to sort it might be a bit dodgy.
The table looks up ATM addresses (16 bit values) given an IP address (32 bit values), maximum size is of the order of 400 entries.
I'm a little puzzled about the hash being 'messy'. Before throwing out the bathwater, make sure the baby isn't in it. There are several ways of implementing hashtables, some of which tend to memory efficiency and others to flat-out speed. You don't say how performance-critical your application is so maybe it's simply using an inappropriate technique.
The approach I favour - and have used on 8-bit machines in assembler - is to use a small fixed-size table of pointers (typically 64) and a simple, arbitrary hashing algorithm that distributes keys in a near-random manner. Such as adding up all the characters of the key, multiplying by some prime to spread out the results then doing modulo (table size) on the result to index into the table. True randomness is achieved only at a significant cost and mostly ain't worth it.
The table contains pointers to linked lists of entries (held outside the table) all having the same hashcode. The first few key hashes will find empty cells in the table, but after a while these get used up. So you just set up a link from the previous entry to the next, with a null pointer to signify the end of the chain. With a 64-entry table the performance will be getting on for 64 times that of a linear search; quite adequate for moderate amounts of data. I used it for a symbol table in assemblers and compilers and it was never the limiting factor on performance, often with far more than 400 symbols. Because entries never get removed from a compiler symbol table they can all go into a linear memory area, with no need to worry about efficient memory reclaim.
-- GT