WSP-Hash-64: The Fastest Portable 64-Bit Hashing Algorithm With Aligned Multi-Byte Memory Reading and Good Quality
WSP-Hash-64 is a 64-bit hashing algorithm as a substantial improvement to CityHash64, FarmHash64, 64-bit MurmurHash3 x64 and x86, 64-bit SpookyHashV2 and XXHash64.
Library
Source
Reference
wsp_hash_64() is the all-at-once hashing function that accepts the 2 following arguments in left-to-right order.
1: input_count is the unsigned long count of elements in the input array.
2: input is the const uint8_t array to hash.
The return value data type is uint64_t.
It returns the 64-bit unsigned integer hash digest result.
wsp_hash_64_initialize() is the initialization function that accepts the following argument.
1: s is the struct wsp_hash_64_s pointer.
wsp_hash_64_transform() is the core hashing loop that accepts the 4 following arguments in left-to-right order.
1: i is the unsigned long starting index position of elements in the input array.
2: input_count is the unsigned long count of elements in the input array. When hashing in split segments, the value must be a multiple of 32, with the exception of the end segment.
3: input is the const uint8_t array to hash.
4: s is the struct wsp_hash_64_s pointer.
wsp_hash_64_finalize() is the finalization function that accepts the following argument.
1: s is the struct wsp_hash_64_s pointer. s.state contains the finalized hash digest result.
The return value data type is void.
Example
#include <stdio.h>
#include "wsp_hash_64.h"
int main(void) {
struct wsp_hash_64_s s;
uint8_t input[8] = {'m', 'e', 's', 's', 'a', 'g', 'e', 0};
unsigned char i = 0;
while (i != 10) {
i++;
printf("All-at-once result %u is 0x%016lx.\n", i, wsp_hash_64(8, input));
input[7]++;
}
input[7] = 0;
i = 0;
while (i != 10) {
i++;
wsp_hash_64_initialize(&s);
wsp_hash_64_transform(0, 8, input, &s);
wsp_hash_64_finalize(&s);
input[7]++;
printf("Segmented result %u is 0x%016lx.\n", i, s.mix);
}
return 0;
}
Requirements
It adheres to the C99 standard draft (ISO/IEC 9899:1999), although it's convertible to other programming languages and standards.
Explanation
WSP-Hash-64 is designed to hash keys of all sizes as quickly as possible with minimal collisions across all truncated bit sizes.
It's the fastest portable 64-bit hashing algorithm with good statistical quality relevant to non-cryptographic hashing.
It's portable for 64-bit systems. There's an alternative 32-bit hashing algorithm for 32-bit systems.
It meets strict compliance, portability and code security requirements.
Multi-byte memory reading is designed primarily for systems with little-endian byte order, although big-endian memory reading is functional with similar quality and speed results.
In rare instances when hash table states must be saved after program termination and re-used in multiple systems with varying endianness, keys should be re-hashed during initialization instead of slowing down runtime hashing with byte order alignment.
In rare instances when byte order is expected to change during runtime, using a one-at-a-time hashing algorithm is a better alternative.
It doesn't use modulus, multiplication or division arithmetic operations.
It supports unlimited input length by splitting input bytes into descending segments of 256, 128, 64 and the remaining 8–63 bits. All-at-once hashing is the faster option, but it isn't required and the digest results are consistent when hashing in partial segments.
Single-threaded, instruction-level parallelism with low-cost addition and bitwise instructions work well on a wide range of CPU queue loads and devices.
Without considering bit distribution calculations, it passes SMHasher collision tests with both 64-bit output and low 32-bit output using both big endian and little endian byte orders.
Seed tests in SMHasher are omitted to both discourage using hashing algorithms as PRNGs and prevent collision vulnerabilities from 2³² different initialized states.
There aren't any bit distribution calculation percentages exceeding 5% in the worst instances of the aforementioned tests, which suggests there aren't any critical distribution vulnerabilities relevant to non-cryptographic hashing.
Avalanche tests in SMHasher are omitted as strict avalanche criterion is only relevant to the analysis of cryptographic hash function output predictability.
Furthermore, the following code tests collision counts for truncated digests against each of the 8 bits flipped within a single input byte ranging from 1 to 255 at all positions for all input_count values ranging from 1 to 256.
#include <stdio.h>
#include "wsp_hash_64.h"
int main(void) {
uint8_t input[256];
uint64_t result = 0;
uint64_t result_flipped = 1;
unsigned long bit_collisions_counts[64];
unsigned short sparse_byte = 0;
unsigned short i = 0;
unsigned short j = 0;
unsigned short k = 0;
unsigned short l = 0;
unsigned short m = 0;
while (i != 64) {
bit_collisions_counts[i] = 0;
i++;
}
i = 1;
while (sparse_byte != 16) {
while (i != 256) {
j = 1;
while (j != 256) {
k = 0;
while (k != j) {
while (l != j) {
input[l] = sparse_byte;
l++;
}
if (sparse_byte == i) {
k = j;
l = 0;
continue;
}
input[k] = i;
result = wsp_hash_64(j, input);
l = 0;
while (l != 8) {
input[k] = input[k] ^ (1 << l);
result_flipped = wsp_hash_64(j, input);
input[k] = input[k] ^ (1 << l);
m = 1;
while (m != 64) {
if (
(result & ((2UL << m) - 1UL)) == (result_flipped & ((2UL << m)
- 1UL))
) {
bit_collisions_counts[m - 1]++;
}
m++;
}
l++;
m = 0;
}
k++;
l = 0;
}
j++;
k = 0;
}
i++;
}
sparse_byte++;
i = 1;
}
i = 2;
j = 0;
while (i != 65) {
printf("%2u-Bit Segmented Collisions: %9lu\n",
i,
bit_collisions_counts[i - 2]);
i++;
j++;
}
return 0;
}
The following collision results demonstrate a sufficient collision-based avalanche effect in the worst instance with non-cryptographic, universal hashing relevance.
Avalanche Collision Results Among 1 Billion 1-Bit Sparse Keys at Varying Lengths
2-Bit Segmented Collisions: 273414977
3-Bit Segmented Collisions: 138972645
4-Bit Segmented Collisions: 71405334
5-Bit Segmented Collisions: 36949894
6-Bit Segmented Collisions: 19201445
7-Bit Segmented Collisions: 10078202
8-Bit Segmented Collisions: 4920417
9-Bit Segmented Collisions: 2552460
10-Bit Segmented Collisions: 1394152
11-Bit Segmented Collisions: 824942
12-Bit Segmented Collisions: 382725
13-Bit Segmented Collisions: 195935
14-Bit Segmented Collisions: 111663
15-Bit Segmented Collisions: 69770
16-Bit Segmented Collisions: 33368
17-Bit Segmented Collisions: 21773
18-Bit Segmented Collisions: 17170
19-Bit Segmented Collisions: 14915
20-Bit Segmented Collisions: 3837
21-Bit Segmented Collisions: 3295
22-Bit Segmented Collisions: 3001
23-Bit Segmented Collisions: 2882
24-Bit Segmented Collisions: 87
25-Bit Segmented Collisions: 38
26-Bit Segmented Collisions: 26
27-Bit Segmented Collisions: 10
28-Bit Segmented Collisions: 2
29-Bit Segmented Collisions: 2
30-Bit Segmented Collisions: 0
31-Bit Segmented Collisions: 0
32-Bit Segmented Collisions: 0
33-Bit Segmented Collisions: 0
34-Bit Segmented Collisions: 0
35-Bit Segmented Collisions: 0
36-Bit Segmented Collisions: 0
37-Bit Segmented Collisions: 0
38-Bit Segmented Collisions: 0
39-Bit Segmented Collisions: 0
40-Bit Segmented Collisions: 0
41-Bit Segmented Collisions: 0
42-Bit Segmented Collisions: 0
43-Bit Segmented Collisions: 0
44-Bit Segmented Collisions: 0
45-Bit Segmented Collisions: 0
46-Bit Segmented Collisions: 0
47-Bit Segmented Collisions: 0
48-Bit Segmented Collisions: 0
49-Bit Segmented Collisions: 0
50-Bit Segmented Collisions: 0
51-Bit Segmented Collisions: 0
52-Bit Segmented Collisions: 0
53-Bit Segmented Collisions: 0
54-Bit Segmented Collisions: 0
55-Bit Segmented Collisions: 0
56-Bit Segmented Collisions: 0
57-Bit Segmented Collisions: 0
58-Bit Segmented Collisions: 0
59-Bit Segmented Collisions: 0
60-Bit Segmented Collisions: 0
61-Bit Segmented Collisions: 0
62-Bit Segmented Collisions: 0
63-Bit Segmented Collisions: 0
64-Bit Segmented Collisions: 0
0 64-bit collisions out of 125 million groups of 8 single-bit flip tests proves the worst instances are rare enough to be considered random occurrences in practical implementations.
The semi-linear collision increments after each truncation are within an acceptable range based on the probability of collisions at each bit size. For example, 200-300 million collisions in the lower 2 bits are expected out of 1 billion total keys, but only up to a few are expected in the range of 32 to 64 bits.
It's compared to the fastest portable 64-bit hashing algorithms that aren't limited by specific programming languages, platform optimizations or portabillity macros. For example, it's 75% to 300% faster than RapidHash without platform-specific instructions. Furthermore, the expressions in the core mixing loop are structured conveniently for compatiblity with SIMD and other intrinsics.
Compared to CityHash64, the speed's approximately 21% faster on average for small keys and 19% faster on average for large keys.
Compared to FarmHash64, the speed's approximately 16% faster on average for small keys and 18% faster on average for large keys.
Compared to 64-bit MurmurHash3 x64 and x86, the speed's approximately 25% faster on average for small keys and 145% faster on average for large keys.
Compared to 64-bit SpookyHashV2, the speed's approximately 40% faster on average for small keys and 22% faster on average for large keys.
Compared to XXHash64, the speed's at least 26% faster on average for small keys and 21% faster for large keys.
All speed tests are performed locally on a Pixelbook Go M3 using Debian.