Hi everyone!
Today I would like to announce my new library. This time it’s a brand new, in-house built array sorting algorithm implemented in Motoko language. And, as usual, it was made with one simple concept in mind - performance Here I need to note that there is a ton-pile of various array sorting methods out there and claiming that something is brand new can be incorrect, as such algorithm as a whole, or partially could be implemented before and we just didn’t know about it. So instead of saying that it is brand new, it would probably be more correct to say that I haven’t seen such one before.
The version I am releasing today is primarily made for sorting numeric arrays and currently the next types are supported (Nat8, Nat16, Nat32, Nat64, Int8, Int16, Int32, Int64).
This, however, doesn’t mean you won’t be able to sort more complex types. An array of Users with name and age properties User { name: Text; age: Nat32 }
sorted by age - no problem. Just pass a property accessor to a sorting function sort(array, func(user) = user.age)
.
More types will come later, including those with non-fixed size (like Nat, Int, Text, Blob etc.) so stay tuned Those will use an updated version of the algorithm that will be very similar in nature, although having additional complexity and a shift from near-linear performance, unavoidable for types with arbitrary size. Speaking of near-linear performance - yep, that is what my measurements for the algorithm are (you can see them below) - increasing array size by 10 times yields an increase in cycles consumption by roughly 10 times as well. In regards to space complexity, it uses 4 additional arrays to accomplish the goal - 1 twin-array and 3 Nat32 arrays.
The algorithm is stable, so you can subsequently sort your array by different item-properties to achieve the desired behavior that suits your advanced sorting needs.
Here is the performance comparison with the existing Array.sortInPlace
method using arrays of 10 to 10_000_000 random Nat32 items (array size stepping factor is 10). Cycles cost factor describes how other array sorting algorithms compare to mine for the same amount of items. Linearity factor describes how performance deviates from the previous entry in the table for the same method compared to a perfectly linear algorithm.
Sorting method | Cycles cost | Cycles cost factor | Linearity factor |
---|---|---|---|
sortNat32 10 items |
8_603 | 1.000 | first entry |
sortNat32 100 items |
81_013 | 1.000 | 0.942 |
sortNat32 1_000 items |
812_502 | 1.000 | 1.003 |
sortNat32 10_000 items |
8_189_344 | 1.000 | 1.008 |
sortNat32 100_000 items |
82_150_281 | 1.000 | 1.003 |
sortNat32 1_000_000 items |
819_529_942 | 1.000 | 0.998 |
sortNat32 10_000_000 items |
8_193_990_495 | 1.000 | 1.000 |
Array.sortInPlace 10 items |
21_563 | 2.506 | first entry |
Array.sortInPlace 100 items |
334_682 | 4.131 | 1.552 |
Array.sortInPlace 1_000 items |
4_583_323 | 5.641 | 1.369 |
Array.sortInPlace 10_000 items |
61_582_218 | 7.520 | 1.344 |
Array.sortInPlace 100_000 items |
742_488_054 | 9.038 | 1.206 |
Array.sortInPlace 1_000_000 items |
8_667_628_545 | 10.576 | 1.167 |
Array.sortInPlace 10_000_000 items |
hits cycles limit | hits cycles limit | hits cycles limit |
Here is the playground where you can check out the library yourself.
The work on this array sorting library was funded by ORIGYN Foundation.
P.S. I am desperately waiting for those numeric data type conversions to avoid going through Nat Also, being able to index arrays with Nat32 and other numeric types would be very helpful.