WSP-Vortex: The Fastest Mersenne Twister Improvement With Faster Speed Than Xorshift32 and Flexible State Sizes
William Stafford Parsons developed a 32-bit pseudorandom number generator algorithm as a substantial improvement to MRG32k3a, 32-bit Mersenne Twister, WELL and other long-period PRNGs.
Library
Source
#include <stdint.h>
struct wsp_vortex_s {
uint32_t blocks[1024];
uint32_t blocks_selector;
uint32_t increment;
uint32_t increment_offset;
};
void wsp_vortex_initialize(uint32_t seed, struct wsp_vortex_s *s) {
unsigned short i = 1;
s->blocks[0] = seed + 1111111111;
while (i != 1024) {
s->blocks[i] = s->blocks[i - 1] + 1;
i++;
}
s->blocks_selector = seed;
s->increment = s->blocks_selector + seed;
s->increment_offset = s->increment + seed;
}
uint32_t wsp_vortex_randomize(struct wsp_vortex_s *s) {
uint32_t block = s->blocks[s->blocks_selector & 1023];
uint32_t increment_offset_capture = s->increment_offset ^ s->increment;
s->blocks[s->blocks_selector & 1023] += increment_offset_capture;
s->increment_offset = ((s->increment_offset << 17)
| (s->increment_offset >> 15)) + s->increment;
s->increment += 1111111111;
s->blocks_selector++;
block += s->increment + increment_offset_capture;
s->blocks[block & 1023] += s->blocks_selector + block;
return block;
}
Reference
wsp_vortex_initialize() is the initialization function that accepts the following 2 arguments.
seed is a 32-bit unsigned integer with any value that's used to initialize the state.
s is a struct wsp_vortex_s pointer.
The return value data type is void.
wsp_vortex_randomize() is the randomization function that accepts the following argument.
s is a struct wsp_vortex_s pointer with 1 array of 1024 32-bit unsigned integers in s->blocks and 3 32-bit unsigned integers in s->blocks_selector, s->increment and s->increment_offset. These can be initialized manually with any seed values or automatically with wsp_vortex_initialize().
The return value data type is uint32_t.
It returns the 32-bit unsigned integer pseudorandom number result.
Requirements
C compiler with C99 (ISO/IEC 9899:1999) standard compatibility.
CPU with single-threaded, instruction-level parallelism support.
Explanation
WSP-Vortex is designed to pass statistical tests with efficient resource usage, fast speed and a huge period.
It has faster speed with more flexibility, simplicity and consistent statistical quality than the optimized SIMD version of Mersenne Twister.
It's portable for both 32-bit and 64-bit systems.
It meets strict compliance, portability and security requirements, although it's a PRNG that isn't suitable for cryptographic purposes.
It doesn't use modulus, multiplication or division arithmetic operations.
32-bit deterministic randomness is generated with a 32864-bit auxiliary state.
The period is adjustable to any halved amount by adjusting the 1024 state blocks, meaning the additional possible state sizes are 512, 256, 128, 64, 32, 16, 8, 4 and 2. If the blocks count is adjusted, the & 1023 mask instances must be adjusted accordingly. These varying state sizes may result in better or worse statistical quality by a small margin.
The estimated full cycle is 2³²⁷⁶⁸ numbers at almost double the period of Mersenne Twister, so re-seeding the initial state isn't required in most applications.
Furthermore, when all state bits are 0, the next pseudorandom number escapes zeroland quickly, making it impossible to have a broken state with any combination of numbers.
A round-robin number selector is incremented before assigning an additive XOR state to the selected s->blocks number, ensuring the closest thing to equal distribution across all possible cycles.
32 bits of state are summed with 1111111111 after each random number generation result and mixed in with the bit shuffling sequence. All state variable assignments are additive to emulate the properties of minimal rolling hash functions.
After 4294967295 random numbers are generated, the aforementioned 32 bits of state in s->increment completes a full cycle and the s->blocks bits avoid a full cycle, as demonstrated in the first 10 instances with the following code example.
#include <stdio.h>
#include "wsp_vortex.h"
int main(void) {
struct wsp_vortex_s s;
uint32_t increment_capture;
unsigned long i = 0;
wsp_vortex_initialize(0, &s);
while (i != 10) {
wsp_vortex_randomize(&s);
increment_capture = s.increment;
wsp_vortex_randomize(&s);
while (s.increment != increment_capture) {
wsp_vortex_randomize(&s);
}
printf("%10lu %10lu %10lu %10lu %10lu %10lu %10lu %10lu\n",
s.blocks[0], s.blocks[1], s.blocks[2], s.blocks[3],
s.blocks[4], s.blocks[5], s.blocks[6], s.blocks[7]);
i++;
}
}
The following output demonstrates a visual confirmation of good distribution among the first 10 blocks in s->blocks after each full cycle of s->increment.
1931670853 967474844 3204854229 2297737264 958724792 508251158 1917101692 1258244746
2259690551 4122786801 2410051447 3906545033 3573745922 3108943746 1201682761 3865852611
153394324 4157848476 202758678 2677176753 1789614556 2873019368 2245855390 3544256983
223943109 3022691432 4014387472 1018169394 1860617740 799206424 1592935145 1859279649
1724810327 3246408868 4121707259 3864350218 4260793370 142047299 1995115525 2047468367
3502436365 2317565887 2867908436 4088128258 2488419653 3415013884 3996268131 830607531
1464330330 1450551438 62390379 35092001 1868424005 1506189108 484545187 3479225524
912727297 4182834878 2458234510 1595581299 1071517920 2725654470 4084524253 4197250444
500651350 3220839128 4039281244 2017724102 3187710418 634166192 4087932823 2474619014
3408077749 1837105558 295384354 4244450864 1796861749 550264166 751506001 1282052450
A wrapping additive constant added to each state block isn't enough to ensure a 2³²⁷⁶⁸ period, so each previous state block value is used as an offset to assign a combined pseudorandom value to a secondary state block.
The following code example demonstrates good distribution across s->blocks instead of brute-forcing millions of statistical tests.
#include <stdint.h>
#include <stdio.h>
struct wsp_vortex_s {
uint32_t blocks[1024];
uint32_t blocks_offset_assignment_counts[1024];
uint32_t blocks_selector;
uint32_t increment;
uint32_t increment_offset;
};
uint32_t wsp_vortex_blocks_distribution_test(struct wsp_vortex_s *s) {
uint32_t block = s->blocks[s->blocks_selector & 1023];
uint32_t increment_offset_capture = s->increment_offset ^ s->increment;
s->blocks[s->blocks_selector & 1023] += increment_offset_capture;
s->increment_offset = ((s->increment_offset << 17)
| (s->increment_offset >> 15)) + s->increment;
s->increment += 1111111111;
s->blocks_selector++;
block += s->increment + increment_offset_capture;
s->blocks[block & 1023] += s->blocks_selector + block;
s->blocks_offset_assignment_counts[block & 1023]++;
return block;
}
int main(void) {
struct wsp_vortex_s s;
unsigned char is_blocks_offset_assignment_duplicates[1024];
unsigned long blocks_offset_assignment_duplicates_counts = 0;
unsigned long i = 0;
unsigned short j = 0;
while (i != 1024) {
s.blocks[i] = 0;
s.blocks_offset_assignment_counts[i] = 0;
is_blocks_offset_assignment_duplicates[i] = 0;
i++;
}
s.blocks_selector = 0;
s.increment = 0;
s.increment_offset = 0;
while (i != 1111111111) {
wsp_vortex_blocks_distribution_test(&s);
i++;
}
i = 0;
while (i != 1024) {
j = 0;
while (j != 1024) {
if (
i != j &&
s.blocks_offset_assignment_counts[i]
== s.blocks_offset_assignment_counts[j] &&
is_blocks_offset_assignment_duplicates[i] != 1
) {
is_blocks_offset_assignment_duplicates[j] = 1;
blocks_offset_assignment_duplicates_counts++;
}
j++;
}
i++;
}
i = 0;
while (i != 1024) {
printf("Block %lu was assigned by an offset value %lu times.\n",
i + 1,
s.blocks_offset_assignment_counts[i]);
i++;
}
printf("\nThere were %lu duplicate offset block assignment counts.\n",
blocks_offset_assignment_duplicates_counts);
return 0;
}
The following first 20 offset assignment counts demonstrate the semi-linear, yet semi-randomized offset assignments distributed throughout the state in s->blocks.
Block 1 was assigned by an offset value 1083842 times.
Block 2 was assigned by an offset value 1083792 times.
Block 3 was assigned by an offset value 1086149 times.
Block 4 was assigned by an offset value 1085322 times.
Block 5 was assigned by an offset value 1085457 times.
Block 6 was assigned by an offset value 1086610 times.
Block 7 was assigned by an offset value 1084757 times.
Block 8 was assigned by an offset value 1086702 times.
Block 9 was assigned by an offset value 1085031 times.
Block 10 was assigned by an offset value 1085576 times.
Block 11 was assigned by an offset value 1083659 times.
Block 12 was assigned by an offset value 1085423 times.
Block 13 was assigned by an offset value 1086734 times.
Block 14 was assigned by an offset value 1084253 times.
Block 15 was assigned by an offset value 1084380 times.
Block 16 was assigned by an offset value 1083285 times.
Block 17 was assigned by an offset value 1085566 times.
Block 18 was assigned by an offset value 1085682 times.
Block 19 was assigned by an offset value 1083464 times.
Block 20 was assigned by an offset value 1085869 times.
Furthermore, there are 145 colliding offset block increments in the aforementioned instance out of 1024 total blocks.
As the count of generated numbers increases, the number of blocks with the same amount of offset increments decreases.
There were 93 collisions after generating 2222222222 numbers, then 78 collisions after generating 3333333333 numbers.
The following test results were performed with all state bits initialized to 0.
It passes all TestU01 BigCrush battery tests as the fastest long-period Mersenne Twister alternative that passes BigCrush.
It was tested with seed initialized to 0 both with and without wsp_vortex_initialize().
It can be seeded with any values and yield similar results due to the proven distribution properties.
When initializing using 1024 random numbers from WSP-TRNG manually assigned to s->blocks, the results were similar with between 98% to 100% of BigCrush tests passed consistently with varying anomalies due to randomness.
Furthermore, Mersenne Twister is known to exhibit multiple consistent 32-bit test failures with small and large random number tests.
Compared to MRG32k3a, the average speed is 1370% faster.
Compared to 32-bit Mersenne Twister, the average speed is at least 40% faster than the unoptimized variant and 25% faster than the SIMD-optimized variant.
Compared to WELL, the average speed is 81% faster.
Compared to all Xoroshiro and Xorshift algorithms, it's at least 3% faster.
Furthermore, most of these compared PRNGs are vulnerable to several confirmed broken cycles when seeded with specific values, including 0.
If efficient memory usage and faster speed are required for 32-bit PRNG usage within a large number of processes, WSP-PRNG-32 is the fastest 32-bit one-at-a-time PRNG that passes BigCrush tests with a small period.
All speed tests were performed locally by generating 1 billion numbers on a Pixelbook Go M3 using Debian.