William Stafford Parsons

WSP-Hash-32: The Fastest Portable 32-Bit Hashing Algorithm With Aligned Multi-Byte Memory Reading and Good Quality

WSP-Hash-32 is a 32-bit hashing algorithm as a substantial improvement to CityHash32 v1.1, Lookup3, 32-bit MurmurHash3 and XXHash32.

Library

Source

wsp_hash_32.c
wsp_hash_32.h

Reference

wsp_hash_32() is the all-at-once hashing function that accepts the 2 following arguments in left-to-right order.

1: input_count is the unsigned long count of elements in the input array.

2: input is the const uint8_t array to hash.

The return value data type is uint32_t.

It returns the 32-bit unsigned integer hash digest result.

wsp_hash_32_initialize() is the initialization function that accepts the following argument.

1: s is the struct wsp_hash_32_s pointer.

wsp_hash_32_transform() is the core hashing loop function that accepts the 4 following arguments in left-to-right order.

1: i is the unsigned long starting index position of elements in the input array.

2: input_count is the unsigned long count of elements in the input array. When hashing in split segments, the value must be a multiple of 32, with the exception of the end segment.

3: input is the const uint8_t array to hash.

4: s is a struct wsp_hash_32_s pointer.

wsp_hash_32_finalize() is the finalization function that accepts the following argument.

1: s is a struct wsp_hash_32_s pointer. s.mix contains the finalized hash digest result.

The return value data type is void.

Example

#include <stdio.h> #include "wsp_hash_32.h" int main(void) { struct wsp_hash_32_s s; uint8_t input[8] = {'m', 'e', 's', 's', 'a', 'g', 'e', 0}; unsigned char i = 0; while (i != 10) { i++; printf("All-at-once result %u is 0x%08x.\n", i, wsp_hash_32(8, input)); input[7]++; } input[7] = 0; i = 0; while (i != 10) { i++; wsp_hash_32_initialize(&s); wsp_hash_32_transform(0, 8, input, &s); wsp_hash_32_finalize(&s); input[7]++; printf("Segmented result %u is 0x%08x.\n", i, s.mix); } return 0; }

Requirements

It adheres to the C99 standard draft (ISO/IEC 9899:1999), although it's convertible to other programming languages and standards.

Explanation

This 32-bit non-cryptographic hashing algorithm is designed to hash keys of all sizes as quickly as possible with minimal collisions across all truncated bit sizes.

It's the fastest portable 32-bit hashing algorithm with good statistical quality relevant to non-cryptographic hashing.

It's portable for both 32-bit and 64-bit systems.

It meets strict compliance, portability and code security requirements.

Multi-byte memory reading is designed primarily for systems with little-endian byte order, although big-endian memory reading is functional with similar quality and speed results.

In rare instances when hash table states must persist after program termination and re-used in multiple systems with varying endianness, keys should be re-hashed during initialization instead of slowing down runtime hashing with byte order alignment.

In rare instances when byte order is expected to change during runtime, using a one-at-a-time hashing algorithm is a better alternative.

It doesn't use modulus, multiplication or division arithmetic operations.

It supports unlimited input length by splitting input bits into descending segments of 256, 128, 64 and the remaining 8–63 bits. All-at-once hashing is the faster option, but it isn't required and the digest results are consistent when hashing in partial segments.

Single-threaded, instruction-level parallelism with low-cost addition and bitwise instructions work well on a wide range of CPU queue loads and devices.

Without considering bit distribution calculations, it passes SMHasher collision tests using both big endian and little endian byte orders.

Seed tests in SMHasher are omitted to both discourage using hashing algorithms as PRNGs and prevent collision vulnerabilities from 2³² different initialized states.

There aren't any bit distribution calculation percentages exceeding 10% in the worst instances of the aforementioned tests, which suggests there aren't any critical distribution vulnerabilities relevant to non-cryptographic hashing.

Avalanche tests in SMHasher are omitted as strict avalanche criterion is only relevant to the analysis of cryptographic hash function output predictability.

Furthermore, the following code tests collision counts for truncated digests against each of the 8 bits flipped within a single input byte ranging from 1 to 255 at all positions for all input_count values ranging from 1 to 256.

#include <stdio.h> #include "wsp_hash_32.h" int main(void) { uint8_t input[256]; uint32_t result = 0; uint32_t result_flipped = 1; unsigned long bit_collisions_counts[32]; unsigned short sparse_byte = 0; unsigned short i = 0; unsigned short j = 0; unsigned short k = 0; unsigned short l = 0; unsigned short m = 0; while (i != 32) { bit_collisions_counts[i] = 0; i++; } i = 1; while (sparse_byte != 16) { while (i != 256) { j = 1; while (j != 256) { k = 0; while (k != j) { while (l != j) { input[l] = sparse_byte; l++; } if (sparse_byte == i) { k = j; l = 0; continue; } input[k] = i; result = wsp_hash_32(j, input); l = 0; while (l != 8) { input[k] = input[k] ^ (1 << l); result_flipped = wsp_hash_32(j, input); input[k] = input[k] ^ (1 << l); m = 1; while (m != 32) { if ( (result & ((2 << m) - 1)) == (result_flipped & ((2 << m) - 1)) ) { bit_collisions_counts[m - 1]++; } m++; } l++; m = 0; } k++; l = 0; } j++; k = 0; } i++; } sparse_byte++; i = 1; } i = 2; j = 0; while (i != 33) { printf("%2u-Bit Segmented Collisions: %9lu\n", i, bit_collisions_counts[i - 2]); i++; j++; } return 0; }

The following collision results demonstrate a sufficient collision-based avalanche effect in the worst instance with non-cryptographic, universal hashing relevance.

Avalanche Collision Results Among 1 Billion 1-Bit Sparse Keys at Varying Lengths 2-Bit Segmented Collisions: 267930841 3-Bit Segmented Collisions: 135013310 4-Bit Segmented Collisions: 67964843 5-Bit Segmented Collisions: 34162986 6-Bit Segmented Collisions: 17250096 7-Bit Segmented Collisions: 8699038 8-Bit Segmented Collisions: 4354013 9-Bit Segmented Collisions: 2208362 10-Bit Segmented Collisions: 1131627 11-Bit Segmented Collisions: 586244 12-Bit Segmented Collisions: 299872 13-Bit Segmented Collisions: 154559 14-Bit Segmented Collisions: 83441 15-Bit Segmented Collisions: 47000 16-Bit Segmented Collisions: 29634 17-Bit Segmented Collisions: 20410 18-Bit Segmented Collisions: 4062 19-Bit Segmented Collisions: 2080 20-Bit Segmented Collisions: 1015 21-Bit Segmented Collisions: 493 22-Bit Segmented Collisions: 237 23-Bit Segmented Collisions: 132 24-Bit Segmented Collisions: 60 25-Bit Segmented Collisions: 31 26-Bit Segmented Collisions: 19 27-Bit Segmented Collisions: 7 28-Bit Segmented Collisions: 4 29-Bit Segmented Collisions: 4 30-Bit Segmented Collisions: 2 31-Bit Segmented Collisions: 2 32-Bit Segmented Collisions: 2

The 2 32-bit collisions out of 125 million groups of 8 single-bit flip tests proves the worst instances are rare enough to be considered random occurrences in practical implementations.

The semi-linear collision increments after each truncation are within an acceptable range based on the probability of collisions at each bit size. For example, 200-300 million collisions in the lower 2 bits are expected out of 1 billion total keys, but only up to a few are expected in the full 32 bits.

Compared to CityHash32 v1.1, the speed's approximately 68% faster on average for small keys and 71% faster on average for large keys.

Compared to Lookup3, the speed's approximately 20% faster on average for small keys and 300% faster on average for large keys.

Furthermore, Lookup3 relies on direct pointer type conversions and unaligned memory reading.

Compared to 32-bit Murmur3, the speed's approximately 45% faster on average for small keys and 300% faster on average for large keys.

Compared to XXHash32, the speed's at least 46% faster on average for small keys and 30% faster for large keys.

All speed tests are performed locally on a Pixelbook Go M3 using Debian.

Games

Contrivity

Contrivity

Spawn into the hostile quantum laboratory and destroy oscillations.