Friday, December 17, 2010

How good is your hashing?

As promised in my last post, I'm working on an embedded lisp-based OS. Yeah, another one. Because what the world really needs is yet another geek's idea of what an OS should look like.

Well, fuck the world, I'm scratching an itch.

Now, if I were being sensible, I'd target something usable, like my Wits A81 tablet device. A little lispy tablet could be really neat (and, indeed, it may well end up being so). However, I decided to start a bit smaller.

A lot smaller, actually.

Yep, a lisp-based OS on a microcontroller. Not, I might add, the st8ms this blog originally started with (and is titled as), but it's bigger brother, the stm32. It's an ARM Cortex-M3 processor, and the one I have in hand has 128K of flash and 8K of RAM, which is quite nifty.

Now, anything that's gonna fit into that little space is gonna have to be tight. Sure, I can push as much code as possible into flash and execute from there, but that 8K is a really tight limit for something like a Lisp, even a cut down one. And so, I started thinking about what takes up lots of space in lisp.

Firstly, there's objects. A naive implementation of boxed objects in Lisp means that even the humblest character takes up a significant amount of space, with 24 bytes not being unheard-of. Well, bollocks to that. I'm being as tight as I can.

Once you've squeezed stuff like characters and numbers and so on down, though, you start looking at what you can get shot of. Symbols, for example. They are used as unique identifiers, and ar generally stored as a hash value and a literal string. Now, the hash value is what's used most of the time, and the string is only really used when "exploding" the symbol out into an array of characters. So if I'm willing not to do that, I can get away with *just* holding the hash. That's a significant saving.

So. What hash to use?

I need something that can be pushed into a single word of space (4 bytes), and which will allow me to detect collisions. By which I mean that if "a" and "b" resolve to the same hash value, I need to be able to tell. The first bit is easy enough, but the second is hard as nails

A google for hashing algorithms will generally end up pointing you to Dan Bernstein's venerable djb2. Unfortunately, it's not really very good, and it certainly doesn't handle the second case. For that, what I need is an algorithm that pushes out *two* hash values, a primary and a secondary.

Enter Bob Jenkins' 'lookup3.c', particularly 'hashlittle2()'. This is a rather nice hashing algorithm that throws out a pair of hash values, has very good characteristics, and runs fast.
But, oh noes! Compiling it for the STM32 results in over 1K of code. Surely we can do better than that? Yes, we can.

.global hash
.type hash, %function
.align 2
@ Hash Function
@ Thumb-2 implementation of 'hashlittle2' by Bob Jenkins.
@ In : R0 -> pointer to string
@ R1 -> length of string
@ R2 -> Initial value for 'hash c'
@ R3 -> Initial value for 'hash b'
@ Out : R0 -> 'hash c', the main hash value
@ R1 -> 'hash b', the secondary hash value
@ Clobbers R2, R3, flags
hash: push {r4-r7,lr}
@ Within the function, we use registers as follows:
@ r1 : length
@ r2-r4 : temporary for character loading
@ r5-r7 : a, b, c

@ initial setup
adr r7, hash_constant @ a = b = c = 0xdeadbeef + length + hash c
ldr r7, [r7]
adds r7, r1
adds r7, r2
mov r5, r7
mov r6, r7
adds r7, r3 @ c += hash b

hash_loop:
cmp r1, #0x0c @ Is r4 <= 12?
it le
ble hash_tail @ If so, go do the tail part

ldmia r0!, {r2, r3, r4} @ Load values
adds r5, r2
adds r6, r3
adds r7, r4
subs r1, #0x0c @ Subtract 12 from length

@ Mix
subs r5, r7 @ a -= c
eor.W r5, r5, r7, ror #28 @ a ^= rot(c,4)
adds r7, r6 @ c += b

subs r6, r5 @ b -= a
eor.W r6, r6, r5, ror #26 @ b ^= rot(a,6)
adds r5, r7 @ a += c;

subs r7, r6 @ c -= b;
eor.W r7, r7, r6, ror #24 @ c ^= rot(b, 8);
adds r6, r5 @ b += a;

subs r5, r7 @ a -= c
eor.W r5, r5, r7, ror #16 @ a ^= rot(c,16)
adds r7, r6 @ c += b

subs r6, r5 @ b -= a
eor.W r6, r6, r5, ror #13 @ b ^= rot(a,19)
adds r5, r7 @ a += c;

subs r7, r6 @ c -= b;
eor.W r7, r7, r6, ror #28 @ c ^= rot(b, 4);
adds r6, r5 @ b += a;

b hash_loop
hash_tail:
cbz r1, hash_done @ length 0 requires no extra work

ldmia r0!, {r2, r3, r4} @ Load values
adr r0, do_hash_mask
add r0, r0, r1, lsl #2

do_hash_mask:
mov.W pc, r0 @ doubles for count 0 entry in masking table, *must* be 4 bytes hence .W
@ Here we mask off the bits we don't want according to
@ what data we have
bic r2, #0x0000ff00 @ count is 1, mask off all but least significant byte
bic r2, #0x00ff0000 @ etc etc
bic r2, #0xff000000
bic r3, #0x000000ff
bic r3, #0x0000ff00
bic r3, #0x00ff0000
bic r3, #0xff000000
bic r4, #0x000000ff
bic r4, #0x0000ff00
bic r4, #0x00ff0000
bic r4, #0xff000000
bic r4, #0x00000000

adds r5, r2 @ Add masked vales before final mix
adds r6, r3
adds r7, r4

@ Final mix
eors r7, r6 @ c ^= b
sub r7, r7, r6, ror #18 @ c -= rot(b,14)
eors r5, r7 @ a ^= c
sub r5, r5, r7, ror #21 @ c -= rot(c,11)
eors r6, r5 @ b ^= a
sub r6, r6, r5, ror #7 @ b -= rot(a,25)
eors r7, r6 @ c ^= b
sub r7, r7, r6, ror #16 @ c -= rot(b,16)
eors r5, r7 @ a ^= c
sub r5, r5, r7, ror #28 @ c -= rot(c,4)
eors r6, r5 @ b ^= a
sub r6, r6, r5, ror #18 @ b -= rot(a,14)
eors r7, r6 @ c ^= b
sub r7, r7, r6, ror #8 @ c -= rot(b,24)

hash_done:
mov r0, r7
mov r1, r6
pop {r4-r7, pc}

hash_constant:
.word 0xdeadbeef

.ltorg

There you go. 206 bytes of hash function. There's probably a few bytes to shave off here and there, but it works pretty well.

As it happens, I'm only using 30 bits of the main hash, with the lowest 2 bits being used to indicate type, in order that symbols fit completely into a single machine word.

No comments:

Post a Comment

Followers