WSP-Sort-Unstable: The Fastest Shell Sort Gap Sequence Calculation and Sorting Algorithm Without Auxiliary Partition Space
William Stafford Parsons developed a bitwise calculation for an optimal sequence of unstable Shell Sort gaps as a substantial improvement to Ciura, Knuth, Sedgwick and Tokuda sequences.
Library
Source
void wsp_sort_unstable_ascending(unsigned long input_count,
unsigned long *input) {
unsigned long _input;
unsigned long gap = (input_count >> 5) + (input_count >> 3) + 1;
unsigned long i;
unsigned long j;
while (gap > 0) {
i = gap;
while (i < input_count) {
_input = input[i];
j = i;
while (
j >= gap &&
input[j - gap] > _input
) {
input[j] = input[j - gap];
j -= gap;
}
input[j] = _input;
i++;
}
if (
gap > 3 ||
gap == 1
) {
gap = (gap >> 5) + (gap >> 2);
} else {
gap = 1;
}
}
}
void wsp_sort_unstable_descending(unsigned long input_count,
unsigned long *input) {
unsigned long _input;
unsigned long gap = (input_count >> 5) + (input_count >> 3) + 1;
unsigned long i;
unsigned long j;
while (gap > 0) {
i = gap;
while (i < input_count) {
_input = input[i];
j = i;
while (
j >= gap &&
input[j - gap] < _input
) {
input[j] = input[j - gap];
j -= gap;
}
input[j] = _input;
i++;
}
if (
gap > 3 ||
gap == 1
) {
gap = (gap >> 5) + (gap >> 2);
} else {
gap = 1;
}
}
}
Reference
wsp_sort_unstable_ascending() is the sorting function in ascending order that accepts the following 2 arguments.
input_count is the count of elements in the input array.
input is the unsigned long array to sort. The default unsigned long data type is interchangeable with any integral data type. The integral data type of the input pointer must match the integral data type of _input.
The return value data type is void.
wsp_sort_unstable_descending() is the sorting function in descending order that accepts the following 2 arguments.
input_count is the count of elements in the input array.
input is the unsigned long array to sort. The default unsigned long data type is interchangeable with any integral data type. The integral data type of the input array must match the integral data type of _input.
The return value data type is void.
Requirements
C compiler with C89 (ISO/IEC 9899:1990) standard compatibility.
Explanation
WSP-Sort-Unstable is designed to both increase the speed and decrease resource usage for all Shell Sort implementations.
It's the fastest in-place sorting algorithm on average that doesn't allocate additional memory for input partitions. This applies across all input data types, distributions and sizes.
It's portable for both 32-bit and 64-bit systems.
It meets strict compliance, portability and code security requirements.
It doesn't use modulus, multiplication or division arithmetic operations.
Before sorting, it doesn't pre-calculate the upper limit with a loop starting from 0. Instead, the gap numbers in each sorting instance are dynamically-calculated with inconsistencies among different input_count values.
Elements are always guaranteed to be sorted within bounds using conditional statements that guarantee the final pass is always a regular insertion sort with a gap value of 1.
After each pass, gap > 3 prevents any result of gap calculation from jumping from either 3 or 2 to 0 instead of 1 based on the following calculation output table.
Gap Calculation
Result
0 0
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 2
9 2
10 2
The following speed test results were performed with 1 million randomized elements.
Compared to optimized Shell Sort, it's 31% faster.
Compared to optimized Knuth's Sequence, it's 24% faster without requiring either division operations or pre-calculated gap increments.
Furthermore, it's still faster when the gap increments are hard-coded.
Compared to optimized Ciura, Sedgewick and Tokuda Sequences, it's marginally-faster without a hard-coded upper limit of elements in auxiliary memory.
The speed is faster with a variety of input_count values and input data types as shown in the following benchmarking table.
10k 2-Byte Inputs
Ciura 0.005s Tied
Knuth 0.005s Tied
SedgeWick 0.005s Tied
Tokuda 0.005s Tied
Shell 0.006s ----
WSP-Sort-Unstable 0.005s Tied
100k 2-Byte Inputs
Ciura 0.040s ---
Knuth 0.040s ---
SedgeWick 0.039s ---
Tokuda 0.040s ---
Shell 0.047s ---
WSP-Sort-Unstable 0.037s Won
250k 2-Byte Inputs
Ciura 0.103s ---
Knuth 0.110s ---
SedgeWick 0.101s ---
Tokuda 0.102s ---
Shell 0.126s ---
WSP-Sort-Unstable 0.091s Won
500k 2-Byte Inputs
Ciura 0.211s ---
Knuth 0.243s ---
SedgeWick 0.212s ---
Tokuda 0.209s ---
Shell 0.291s ---
WSP-Sort-Unstable 0.191s Won
1m 1-Byte Inputs
Ciura 0.302s ---
Knuth 0.349s ---
SedgeWick 0.291s ---
Tokuda 0.304s ---
Shell 0.380s ---
WSP-Sort-Unstable 0.266s Won
1m 2-Byte Inputs
Ciura 0.461s ---
Knuth 0.570s ---
SedgeWick 0.453s ---
Tokuda 0.451s ---
Shell 0.634s ---
WSP-Sort-Unstable 0.423s Won
It's faster than every other sequence as well, including Hibbard, Papernov & Stasevich and Pratt.
All speed tests were performed locally on a Pixelbook Go M3 using Debian.