WSP-Search-Sorted: The Fastest Search Algorithm Derived From Binary Search

William Stafford Parsons developed an optimal search algorithm as a substantial improvement to Binary Search, Exponential Search, Fibonacci Search and Interpolation Search.

Library

Source

unsigned char wsp_search_sorted_ascending(unsigned long low, unsigned long high, const unsigned long *haystack, const unsigned long needle, unsigned long *position) { unsigned long gap; if (haystack[high] == needle) { *position = high; return 1; } if ( haystack[low] < needle && haystack[high] > needle ) { high--; gap = high - low; while (haystack[high] != needle) { while (haystack[high] > needle) { high -= gap >> 1; gap = (gap + 1) >> 1; } while (haystack[high] < needle) { high += gap >> 1; gap = (gap + 1) >> 1; } } low = high; } if (haystack[low] == needle) { *position = low; return 1; } return 0; } unsigned char wsp_search_sorted_descending(unsigned long low, unsigned long high, const unsigned long *haystack, const unsigned long needle, unsigned long *position) { unsigned long gap; if (haystack[high] == needle) { *position = high; return 1; } if ( haystack[low] > needle && haystack[high] < needle ) { high--; gap = high - low; while (haystack[high] != needle) { while (haystack[high] < needle) { high -= gap >> 1; gap = (gap + 1) >> 1; } while (haystack[high] > needle) { high += gap >> 1; gap = (gap + 1) >> 1; } } low = high; } if (haystack[low] == needle) { *position = low; return 1; } return 0; }

Reference

wsp_search_sorted_ascending() is the searching function for a list of elements already sorted in ascending order that accepts the following 5 arguments.

low is the lowest index bound to search in haystack.

high is the highest index bound to search in haystack.

The low value must be less than or equal to the high value and both must be valid indices in haystack.

haystack is the unsigned long array of elements to search. The default unsigned long data type is interchangeable with any integral data type after modification to the relevant function parameters.

needle is the unsigned long element to search for in haystack. The data type should match the haystack data type.

position is the unsigned long pointer containing the index of the searched element.

The return value data type is unsigned char.

When the return value is 1, position contains the index of the found needle.

When the return value is 0, a new index value isn't assigned to position.

wsp_search_sorted_descending() is the searching function for a list of elements already sorted in descending order that accepts the following 5 arguments.

low is the lowest index bound to search in haystack.

high is the highest index bound to search in haystack.

The low value must be less than or equal to the high value and both must be valid indices in haystack.

haystack is the unsigned long array of elements to search. The default unsigned long data type is interchangeable with any integral data type after modification to the relevant function parameters.

needle is the unsigned long element to search for in haystack. The data type should match the haystack data type.

position is the unsigned long pointer containing the index of the searched element.

The return value data type is unsigned char.

When the return value is 1, position contains the index of the found needle.

When the return value is 0, a new index value isn't assigned to position.

Requirements

C compiler with C89 (ISO/IEC 9899:1990) standard compatibility.

Explanation

WSP-Search-Sorted is designed as a practical optimization for all sorted-list search algorithm implementations.

It's portable for both 32-bit and 64-bit systems.

It meets strict compliance, portability and code security requirements.

It doesn't use modulus, multiplication or division arithmetic operations.

In the rare instance where the sort order is unknown, the following wsp_search_sorted() function searches in either ascending or descending order as a convenience with a marginal performance tradeoff.

unsigned char wsp_search_sorted(unsigned long low, unsigned long high, const unsigned long *haystack, const unsigned long needle, unsigned long *position) { unsigned long gap; if (haystack[high] == needle) { *position = high; return 1; } if ( ( haystack[low] < needle && haystack[high] > needle ) || ( haystack[low] > needle && haystack[high] < needle ) ) { high--; gap = high - low; if (haystack[low] < needle) { while (haystack[high] != needle) { while (haystack[high] > needle) { high -= gap >> 1; gap = (gap + 1) >> 1; } while (haystack[high] < needle) { high += gap >> 1; gap = (gap + 1) >> 1; } } } else { while (haystack[high] != needle) { while (haystack[high] < needle) { high -= gap >> 1; gap = (gap + 1) >> 1; } while (haystack[high] > needle) { high += gap >> 1; gap = (gap + 1) >> 1; } } } low = high; } if (haystack[low] == needle) { *position = low; return 1; } return 0; }

Beyond the common optimization of verifying the existence of needle within haystack before searching, it modifies the high bound instead of initializing with a median pivot value.

Subtraction is always the first operation on the high bound, so it decreases repeatedly in the same direction without either wasting CPU cycles on unnecessary conditional statements or exceeding the low bounds that were already verified to contain needle. This drastically decreases the number of gap calculations and comparisons in the average instance.

Furthermore, the positioning of conditional statements with gap calculations offset by + 1 prevents unnecessary array accesses and equality operations after each modification of the high bound. This drastically increases speed on average, especially for large arrays.

This process repeats in each iteration with alternating directions whenever the high bound almost exceeds the position of the needle element.

Compared to Binary Search, it's close to a lossless optimization with up to 890% faster speed on average in the best instances and 15% faster speed on average in most instances.

A specific best instance against Binary Search is demonstrated with the following code example.

#include <stdio.h> #include "wsp_search_sorted.h" int main(void) { unsigned short input[1111111]; unsigned long position; unsigned long verification = 0; unsigned long i = 0; unsigned long j = 0; unsigned long k; while (i != 1111111) { input[i] = i; i++; } while (j != 200) { i = 1111111 - j; k = 111111 - j; while (k != 1000) { i--; if (wsp_search_sorted_ascending(0, k, (const unsigned short *) input, (const unsigned short) input[i], &position) != 0) { verification++; } k--; } j++; } printf("The verification value is %lu.\n", verification); return 0; }

Compared to Interpolation Search, it doesn't rely on evenly-distributed haystack values for heuristic calculations and it's 1000% faster than the worst case.

Compared to other sorted-list search algorithms for sorted arrays, it's at least 10% faster on average across a range of data types and data, both randomized and uniform.

It's optimized for instances where low is greater than 0 and where needle is either found or not found in haystack.

Average speed for uniform data was tested with the following code example.

#include <stdio.h> #include "wsp_search_sorted.h" int main(void) { unsigned short input[1111111]; unsigned long position; unsigned long verification = 0; unsigned long i = 0; unsigned long j = 0; unsigned long k; while (i != 1111111) { input[i] = i | 3; i++; } while (j != 200) { i = 111110 - j; k = 1111101 - j; while (i != 10000) { i--; if (wsp_search_sorted_ascending(0, k, (const unsigned short *) input, (const unsigned short) input[i], &position) != 0) { verification++; } k--; } j++; } printf("The verification value is %lu.\n", verification); return 0; }

Furthermore, the average speed was faster with randomized distribution using the aforementioned code, but with sorted numbers generated from various seeded states in WSP-PRNG-32.

Different degrees of distribution were tested with bitwise OR operands 3, 7, 15 and so on.

The following code example demonstrates the aforementioned worst instance with Interpolation Search. It searches for 1 in an array with all values defined as 2 except for the first value defined as 0 and the second value defined as 1.

#include <stdio.h> #include "interpolation_search.h" int main(void) { unsigned short input[1111111]; unsigned long position; unsigned long verification = 0; unsigned long i = 0; unsigned long j = 0; unsigned long k; while (i != 1111111) { input[i] = 2; i++; } input[0] = 0; input[1] = 1; input[2] = 2; while (j != 200) { i = 1111100 - j; k = 1111101 - j; while (i != 10000) { i--; if (interpolation_search_ascending(0, k, (const unsigned short *) input, 1, &position) != 0) { verification++; } k--; } j++; } printf("The verification value is %lu.\n", verification); return 0; }

The aforementioned instance with WSP-Search-Sorted is faster than the same instance with Binary Search.

All speed tests were performed locally on a Pixelbook Go M3 using Debian.

Games

ContrivityContrivity Icon

Contrivity

Spawn into the hostile quantum laboratory and destroy oscillations.