New ordered map/set for Motoko's base library

Project highlights

We are excited to share that we (Serokell Company) have finished a grant focused on improving Motoko’s base library. We have added new OrderedMap and OrderedSet data structures, which are improved version of the old RBTree module.

The new data structures preserve the type stability of its type parameters, making canister upgrades much easier, safer, and faster. They also provide a much richer API, and are more optimized, and come with comprehensive documentation with examples.

A standard library is crucial for any programming language that is significantly influencing the developer’s experience. We are happy we could help make Motoko better. Please see the example below, and we are looking forward to your feedback.

Usage example

Let’s imagine that we want to create a small canister KeyHolder which should help us store a collection of keys e. g. for key leasing service. We would want to support time-related queries (e. g. get recent keys, remove old ones), so keys should be ordered by creation time. We will try to implement it via the new data structures avoiding corner cases for the sake of simplicity and will tell about its features along the way.

Type stability

Old RBTree encountered an obstacle in that the comparison function should be captured and has a non-stable type which prevents storing the map into a stable variable and forces manual data migration during the upgrade. The new collection preserves the stability property since it separates data and functions over it into different types:

import Map "mo:base/OrderedMap";
import Time   "mo:base/Time";

actor KeyHolder {
  // Map operations that captures `compare` and key type `Int`
  let intMap : Map.Operations<Time.Time> = Map.Make<Int>(Int.compare); 
  // The map stored in a stable field
  stable var keyStorage : Map.Map<Time.Time, Text> = intMap.empty<Text>();
}

Functional API

The new collections offer a functional API instead of an imperative one, which can be more convenient and lead to cleaner code. To demonstrate how this paradigm looks in practice, let’s implement key insertion:

public func addKey(key : Text) : async () {
  keyStorage := intMap.put(keyStorage, Time.now(), key);
};

Disclaimer: This implementation cannot look up a key by id so cannot check if it is already presented. A real canister has to have a second map Map<Key, Time>. See the full snippet here.

Rich API

The new collections provide a comprehensive set of functionality inspired by the OCaml standard library. For example, we can implement removing old keys from the collection via one method:

public func removeOldKeys(beforeStamp : Time.Time) : async () {
  keyStorage := intMap.mapFilter(keyStorage, 
                  func (st, key) { if (st >= beforeStamp) { ?key } else { null } );
}; 

or get the oldest key:

public func getOldestKey() : async ?(Time.Time, Text) {  
  return intMap.minEntry(keyStorage);
} 

The full API can be found on the official website after the next release here. Meanwhile, you can see it here in HTML previewer: OrderedMap.html, OrderedSet.html.

Optimizations

We carefully optimized the new data structures through canister-profiling benchmarks. Among various implementations, we identified the best ones in terms of performance and size. As a result, our KeyHolder canister can handle high contention with reasonable costs. For instance, with RBTree it would require 20% more memory and be 10% slower. Additionally, the size() operation of the new collection takes only O(1) time, which is quite convenient.

You can find the final benchmark results here.

Sources:

Future Plans

The next logical step to improve Motoko’s base library would be also add an imperative version of the OrderedMap/OrderedSet.

10 Likes

This is awesome! Amazing work.

I would ask that as a general nod to the community that has produced a number of other map libraries(some that preserve insertion order, some that focus on providing scanning, etc) that these be added to mops and not base.

In general the “base” perpetuates a not-made-here perception that has been difficult in a number of ways.

In general I’d encourage a movement from “base” to “basic” or “simple” for the initial base libraries with internal notations that more robust libraries can be found in the community repositories.

Some robust documentation of which libraries perform which purpose best with some metrics would also be incredibly helpful.

2 Likes

There certainly are other good libraries that deserve to be mentioned in the documentation, and we’d be happy to field suggestions. To be honest, I think our Motoko users rather than compiler devs are better placed to report what works well for them in the wild.

This project is mostly a strict improvement over what we already had in base, replacing RBTree.mo with something a bit more efficient that avoids the need for preupgrade hooks and or passing extra args with every operation.

Ordered sets don’t seem to be available on mops but are closely related so we took the liberty of adding them for completeness.

I think we want base to provide at least some basic functionality for new users before they decide to explore mops.

1 Like

Great work wit OrderedMap.

The data structure would benefit from a toArray function. Currently, one has to go through an Iter by using entries() and then Iter.toArray. This is unnecessary because the size of the map is known in advance, so we can use Array.tabulate directly. Iter.toArray copies everything to a Buffer first because it does not know the size in advance. That copying is unnecessary because we know the size in advance.

1 Like