New Motoko array sorting library

Hi everyone!

Today I would like to announce my new library. This time it’s a brand new, in-house built array sorting algorithm implemented in Motoko language. And, as usual, it was made with one simple concept in mind - performance :slight_smile: Here I need to note that there is a ton-pile of various array sorting methods out there and claiming that something is brand new can be incorrect, as such algorithm as a whole, or partially could be implemented before and we just didn’t know about it. So instead of saying that it is brand new, it would probably be more correct to say that I haven’t seen such one before.

The version I am releasing today is primarily made for sorting numeric arrays and currently the next types are supported (Nat8, Nat16, Nat32, Nat64, Int8, Int16, Int32, Int64).

This, however, doesn’t mean you won’t be able to sort more complex types. An array of Users with name and age properties User { name: Text; age: Nat32 } sorted by age - no problem. Just pass a property accessor to a sorting function sort(array, func(user) = user.age).

More types will come later, including those with non-fixed size (like Nat, Int, Text, Blob etc.) so stay tuned :slight_smile: Those will use an updated version of the algorithm that will be very similar in nature, although having additional complexity and a shift from near-linear performance, unavoidable for types with arbitrary size. Speaking of near-linear performance - yep, that is what my measurements for the algorithm are (you can see them below) - increasing array size by 10 times yields an increase in cycles consumption by roughly 10 times as well. In regards to space complexity, it uses 4 additional arrays to accomplish the goal - 1 twin-array and 3 Nat32 arrays.

The algorithm is stable, so you can subsequently sort your array by different item-properties to achieve the desired behavior that suits your advanced sorting needs.

Here is the performance comparison with the existing Array.sortInPlace method using arrays of 10 to 10_000_000 random Nat32 items (array size stepping factor is 10). Cycles cost factor describes how other array sorting algorithms compare to mine for the same amount of items. Linearity factor describes how performance deviates from the previous entry in the table for the same method compared to a perfectly linear algorithm.

Sorting method Cycles cost Cycles cost factor Linearity factor
sortNat32 10 items 8_603 1.000 first entry
sortNat32 100 items 81_013 1.000 0.942
sortNat32 1_000 items 812_502 1.000 1.003
sortNat32 10_000 items 8_189_344 1.000 1.008
sortNat32 100_000 items 82_150_281 1.000 1.003
sortNat32 1_000_000 items 819_529_942 1.000 0.998
sortNat32 10_000_000 items 8_193_990_495 1.000 1.000
Array.sortInPlace 10 items 21_563 2.506 first entry
Array.sortInPlace 100 items 334_682 4.131 1.552
Array.sortInPlace 1_000 items 4_583_323 5.641 1.369
Array.sortInPlace 10_000 items 61_582_218 7.520 1.344
Array.sortInPlace 100_000 items 742_488_054 9.038 1.206
Array.sortInPlace 1_000_000 items 8_667_628_545 10.576 1.167
Array.sortInPlace 10_000_000 items hits cycles limit hits cycles limit hits cycles limit

Here is the playground where you can check out the library yourself.

The work on this array sorting library was funded by ORIGYN Foundation.

P.S. I am desperately waiting for those numeric data type conversions to avoid going through Nat :slight_smile: Also, being able to index arrays with Nat32 and other numeric types would be very helpful.

11 Likes

Excuse me while I go sort 10 million items like it’s nothing.

coin sorting GIF

Nice @ZhenyaUsenko.

Can it cross over to work with New Vector data structure in Motoko?

2 Likes

It will be difficult to provide a fully generic interface to support arbitrary array-like structures for sorting (Buffer or Vector you’ve mentioned above, which by the way looks interesting). So for now I think the way to go would be to convert such structures to arrays and sort after. We can consider a better solution later, even creating a separate sorting library for a given data structure if the demand is high would be possible.

Is there a high-level description of the algorithm? Is this similar to radix sort or bucket sort?

It would be fascinating to compare the cycle consumption with typical Rust implementations: the standard slice::sort_unstable,slice::sort, and maybe some version of radix sort that I assume you have implemented.

2 Likes

Since this is better™ than the one in base, can it simply be made the algorithm provided by base?

Ah, it’s specialized to the element type, so not a drop-in replacement? Ok, then probably better as a separate library. Thanks, great work!

1 Like

I’ll work on the algorithm description soon, maybe even capture a video-description.
It has some similarities with MSD radix sort, doesn’t use lexicographic sorting though.

I’d be very curious to know how much of the win over the base library is due to being type-specialised and hence avoiding the overhead of calling a comparison function, and how much is due to algorithmic factors. If you turned yours into a generic version, what would the numbers be?

All of the currently supported types (Nat8 , Nat16 , Nat32 , Nat64 , Int8 , Int16 , Int32 , Int64 ) use practically the same algorithm, just do different type conversions under the hood. Current algorithm can not be turned into fully generic. You won’t be able to sort text with it. As I mentioned before, the version of the algorithm that works with non-fixed size data types is also planned. You can potentially sort Int and Nat with the current implementation, but I am not satisfied with the performance. I will be able to achieve noticeably better results for Int and Nat with the updated (additional) version that will specialize on sorting non-fixed size data types. If you can tolerate truncating your Nat/Int to 64bits, it is easy to sort them now sortNat64(array, func(nat) = natToNat64(nat))

On purely theoretical grounds, the algorithm cannot be based on a comparison function because the achieved complexity is O(n).

I’ve made a video description of the algorithm. Sorry for it being so clunky and long, I don’t have much experience in video capturing. You can watch it at 1.5x speed if you want.

(seems like 1080p doesn’t want to show up in the player, you can download the file to watch it in the original resolution)

I don’t have much experience with Rust development. I will do this comparison eventually but it may take a while… Maybe there are some volunteers?)

Here is a simple description of the algorithm for those who don’t want to watch the video

  1. initialize the sorting range to [0, length - 1]
  2. iterate through the range and find min and max values
    • fast return if no cases with (prev value > value) found
  3. set all items from the range in the counts array to 0
  4. for each item from the range in the input array
    • calculate a new index as rangeFrom + (item - min) / ((max - min) / (rangeTo - rangeFrom))
    • increment counts array at this index
  5. initialize totalCount to rangeFrom
  6. for each count from the range in the counts array which is not 0
    • set shifterCounts array at index totalCount to count
    • replace count in the counts array with totalCount
    • add count to totalCount
  7. for each item from the range in the input array
    • calculate a new index as rangeFrom + (item - min) / ((max - min) / (rangeTo - rangeFrom))
    • get count from counts array at this index
    • set twinArray at index count to current item
    • increment counts array at index
  8. update all items from the range in the input array with values from twinArray
  9. for each index and count from the range in the shifterCounts array where count is greater than 1 run steps 2-9 with the range set to [index, index + count - 1]

@ZhenyaUsenko, thanks for the summary. So this is an example of trading time complexity for space complexity, since this algorithm appears to take O(n) extra space, whereas traditional in-place sorting algorithms are O(1) (or O(log N) if you count stack space).

Given that Motoko is a GC-ed language, an accurate measurement of time complexity would probably have to account for the extra GC time this induces, but in the end that should still only amount to constant factors.

In terms of space complexity, this algorithm takes 12 bytes of additional space for each item to be exact. In my measurements Array.sortInPlace takes 4 bytes of additional space for each item, which is not O(1). In both cases, when sorting arrays of complex structures, which should be a more common real world usage than just arrays of plain numbers, this extra space, in my opinion, can be regarded as negligible.

Oh, indeed, sortInPlace uses a scratch array of size N. I don’t know why that particular algorithm was chosen.

If the algorithm produces O(n) garbage but completes in one message, before GC runs, then doesn’t that add no GC time at all? I mean because GC time is a function of the live heap not the garbage.

@timo, that depends on the GC strategy. E.g., with Motoko’s incremental GC, the trade-off may shift somewhat (others would know better than I). In general, short-lived garbage should indeed be cheap, but not entirely free. For starters, creating a lot of garbage causes the GC to run more often.

Just released a new minor version with further performance optimizations. On average, it has 28% better performance than the previous version. The updated version utilizes the newly added adjacent numeric type conversions. They contribute to the performance improvements a little bit. The main advantage though is the ability to remove a hack I used for sortNat32 to avoid going through Bigint and just use nat32toNat64 instead. Thanks @kentosugama for that.

Here is the updated performance comparison table.

Sorting method Cycle cost Cycle cost factor Linearity factor
sortNat32 10 items 5_574 1.000 first entry
sortNat32 100 items 51_852 1.000 0.930
sortNat32 1_000 items 518_850 1.000 1.000
sortNat32 10_000 items 5_241_389 1.000 1.010
sortNat32 100_000 items 52_557_449 1.000 1.003
sortNat32 1_000_000 items 524_319_238 1.000 0.998
sortNat32 10_000_000 items 5_244_394_674 1.000 1.000
Array.sortInPlace 10 items 19_684 3.531 first entry
Array.sortInPlace 100 items 302_719 5.838 1.538
Array.sortInPlace 1_000 items 4_128_550 7.957 1.364
Array.sortInPlace 10_000 items 55_342_852 10.559 1.340
Array.sortInPlace 100_000 items 666_151_264 12.675 1.204
Array.sortInPlace 1_000_000 items 7_768_693_694 14.817 1.166
Array.sortInPlace 10_000_000 items hits cycle limit hits cycle limit hits cycle limit

I am going to replace the table in the original post soon.

4 Likes