We are pleased to announce a new Motoko package that provides an efficient vector data structure with O(sqrt(n)) memory waste and O(sqrt(n)) worst-case append cost. It is published on mops and github.
The go-to data structure for a variable length array in Motoko is Buffer, but memory waste and worst-case append cost are both linear. Buffer internally uses a 1-dimensional Array and when that is full then the whole array gets copied over into a newly allocated one. Vector instead uses a 2-dimensional Array and never copies data blocks. The trade-off is an increased random access cost because two random Array accesses happen under the hood. However, the improved worst-case cycle cost makes it more predictable to use when cycle limits are critical.
The implementation is highly cycle-optimized. We have extensively benchmarked Vector against Buffer and Array. The result can be seen in the README.
Hi Timo, interesting. I haven’t seen a 2D array used for this before, only ropes (trees of arrays), where most operations are O(log N), I believe. Is there some pointer comparing these representations?
I haven’t seen any external comparison between a 2D array and a tree-based approach. The paper on which our 2D array is based (https://sedgewick.io/wp-content/themes/sedgewick/papers/1999Optimal.pdf) does not go into that. Our primary goal was to reduce the worst-case append cost. Naturally we considered a tree as well because that could bring it down even further. But our secondary goal was to maintain O(1) access cost. We considered O(sqrt(n)) worst-case append good enough for our application, hence this trade-off was made.
Interesting. I think the main benefits of this sqrt(n)-sized array is that the space complexity is optimal. A rope would need O(n) extra space. For time complexity, rope’s access and append are both O(logn), whereas the array’s access and append are O(1) and O(sqrt(n)). Given our memory is only 4G, not sure how much it matters for O(1) and O(log n).
I think the sqrt(n) trick was first introduced in Knuth’s book. Because the size of the array is O(sqrt(n)), it automatically makes any quadratic algorithm linear time. Knuth also mentioned that we can use this data structure to simulate Josephus problem. I rarely see this data structure being implemented in practice, probably because memory is cheap?