New Motoko sort package

timo · January 9, 2026, 5:58am

We are pleased to announce a new package, sort, available in mops. It is meant to bundle various types of sorting algorithms in one package and is optimised for performance. The initial version contains bucket sort, radix sort and merge sort for keys of type Nat32.

All algorithms work on VarArray input and are in-place sorts.

The implementations outperform all previously available sort algorithms in Motoko in terms of instructions and memory (see benchmarks further below).

As counting based algorithms, bucket sort and radix sort require keys with bits.
The package currently requires keys to be of type Nat32. Versions for other key types such as Nat64, Int32/64 or Blob are planned for the future based on demand. Please request what you need.

Merge sort is comparison based but at this point is also only implemented for key type Nat32 in this package. For merge sort with arbitrary keys, requiring only a compare function, see this motoko-core PR instead: Optimize sort. by AStepanov25 · Pull Request #453 · caffeinelabs/motoko-core · GitHub The PR replaces VarArray.sort in motoko-core with the merge sort of this package, re-written for a generic compare function. The PR is about twice as fast while using only half the memory as core’s old VarArray.sort.

Which algorithm should you use? Radix sort does not have a worst case (or worst case = average case). But bucket sort is faster in the average case. Merge sort is slower but uses the least amount of memory. Both bucket sort and radix sort internally fall back on merge sort for small input, so there is no need to do that manually.

Some recommendations:

Use radix sort when the distribution of keys in the input is unknown or known to be highly non-uniform. It’s speed is independent of the distribution.
Use radix sort if tight control of the worst case performance is required. For example, if the key distribution in the input can be controlled by an adversary.
Use bucket sort if the distribution is known to be close to uniform.
Use bucket sort for best performance in the average case.

Optional: for bucket sort and radix sort, if the full key space isn’t used then you can speed up the sort by passing in an upper bound for the keys. In some cases it may be worth calculating the maximum key value by doing one extra pass over data before calling sort.

The bucket sort implementation was inspired by motoko-sort (also see this forum thread). We improved instructions a little bit and memory by a lot. But motoko-sort provides many key types where we currently only support Nat32.

For further development of this package we would like know how sorting is being used in canisters. Please let us know below. For example:

Are you sorting call input (arguments) or data that is already in state before the call?
Are you sorting output (preparing return values) or data to keep it in state in a sorted way.
What sizes of input are sorting?
What are your key types?
What is your desired input/output type (VarArray, Array, List, etc.)?

The algorithms are described further in the README.

Benchmarks:
Instructions:

	100	1000	10000	12000	100000	1000000
bucketSort	43_946	0.42 M	4.17 M	4.96 M	41.3 M	412 M
bucketSortWorstCase	0.21 M	1.15 M	9.41 M	11.1 M	91.9 M	553 M
radixSort	66_439	1.04 M	8.82 M	10.5 M	63.9 M	527 M
Zhus	63_420	0.62 M	6.3 M	7.53 M	63 M	628 M
mergeSort	66_620	1.04 M	15.4 M	18.5 M	193 M	2_319 M
VarArray	0.21 M	2.68 M	35.8 M	43.1 M	442 M	5_047 M

Memory:

	100	1000	10000	12000	100000	1000000
bucketSort	872 B	5.24 KiB	55.4 KiB	63.21 KiB	518.96 KiB	4.82 MiB
bucketSortWorstCase	1.54 KiB	8.27 KiB	72.41 KiB	80.23 KiB	646.99 KiB	4.88 MiB
radixSort	536 B	2.28 KiB	47.44 KiB	55.25 KiB	647 KiB	4.07 MiB
Zhus	1.52 KiB	12.06 KiB	117.53 KiB	140.97 KiB	1.14 MiB	11.44 MiB
mergeSort	536 B	2.28 KiB	19.86 KiB	23.77 KiB	195.64 KiB	1.91 MiB
VarArray	736 B	4.23 KiB	39.39 KiB	47.2 KiB	390.95 KiB	3.82 MiB

ZhenyaUsenko · January 9, 2026, 4:19pm

Do you have a comparison of how much performance gain bucketSort has from switching to Insertion sort for (n < 8) buckets and Merge sort for (8 < n < 16) buckets?

For my algorithm I’ve found that every next bucket size has significantly less performance gain from switching to a more appropriate algorithm. I had a ~11% improvement from using a sorting network for 4-item buckets and ~3% improvement from using a sorting network for 5-item buckets. Which was already too low of a gain to add additional code. Also when I tried using a generic Insertion sort for 5-item buckets it was actually slower than the main algorithm (maybe I didn’t optimize it well but still).

Did you try switching at different bucket size breakpoints?

Also, I think I can improve memory usage by 1/3, my algorithm was never updated for 64 bit address space. But that’s only for arrays that are smaller than 32bit. Idk how practical it is to support sorting for arrays that are larger than 32bit.

timo · January 9, 2026, 4:39pm

How much gain relative to what exactly? Relative to not doing anything special for small buckets at all? Then I would think maybe the difference to radix sort is a ballpark number because radix sort is essentially the same work as bucket sort but without the small bucket optimization.

Or do you mean relative to doing it only for n <= 4? Then also I don’t know exactly. But I do remember that extending it from 8 to n <= 10, and respectively n <= 20 for merge sort, didn’t give that much. Maybe only 1%. We stopped optimising things when they gave less than 1% gain.

To be absolutely sure what the gain is I think you can easily turn it off or change the n <= 8 if-clause to another number and re-run mops bench.

We also had this experience that using generic code here didn’t give any gain.

What do you mean? Do you mean if we tried different values instead of 8 in the n <= 8 condition? Yes, we saw an incremental gain with each higher value, it just dropped to around 1% or lower for when going from 8 to 9.

What do you mean by address space? Do you mean wasm32/wasm64? Or how long the array can be? I forgot to say in the announcement that input array size is limited to 2^32-1. I guess it is not practical to sort anything larger than 100M with these type of algorithms (within the DTS instruction limit). For anything larger an incremental sort would be required.

AStepanov25 · January 11, 2026, 3:22pm

The thing is the distribution of bucket sizes, i.e. counts of buckets of given size of a large array, is different from yours. And expected number of elements per bucket is different. We have larger buckets because of the way we choose radix, we switched from larger radix to two times smaller and thus bigger buckets because it showed performance gains, because unrolled insertion sort is faster than any algorithm for n approximately equal 16.

Topic		Replies	Views
New Motoko array sorting library Motoko	19	1057	December 6, 2025
Motoko B-Tree, for efficient ordered key-value data Language Support Motoko	22	1754	May 18, 2022
Completed: ICDevs.org - Bounty 24 - StableBtree Mokoko - up to $10k Bounties & RFPs Discussing	35	3769	October 20, 2023
Array.sort() is not available when programming Motoko Developers Functional-Programming	7	995	July 14, 2021
Should I use HashMap or RBTree? Motoko	8	842	May 11, 2023

New Motoko sort package

Related topics