Motoko B-Tree, for efficient ordered key-value data

matthewhammer · May 17, 2022, 12:22am

I’ve been talking with @icme about the use case for an imperative data structure for ordered key-value data. As many recognize, the existing RBTree module from base is deficient for this use case in that the delete operation does not fully recover the space, or remove the key. Further, the RBTree is always a binary tree, using lots of space for pointers, compared to data being stored.

By moving to a B-Tree with a tunable degree parameter, more tuning is possible, and potentially, more efficiency too. Further, the delete operation for a B-Tree is comparatively simpler than that of either a functional or imperative RBTree, and will remove the key completely. As a result, the B-Tree (but not the existing, functional RBTree) will be space efficient for practical workloads with lots of deletion.

We were both interested in opening the discussion of this data structure to the wider dev community, and @icme suggested posting here.

Questions:

Has this been done in Motoko? We don’t think so, but please correct us if that’s mistaken. There does seem to exist a Rust solution: ic/btreemap.rs at master · dfinity/ic · GitHub

We’d be aiming to do something comparable, but in Motoko, hopefully for inclusion in base.

In any case, it would also be a stable data structure for Motoko, in that it would permit stable variables to hold their values, somehow.

What operations are most essential?

Our list thus far:

Insertion
Bulk insertion
Validation check (is valid B Tree?)
Search
Pure iteration: Through a range.
Map Iteration: Map (and update) values in a key range.
Deletion

What performance tests?

Speed of each operation (plot for increasing data sizes)
Throughput of each operation (i.e. how many insertions per consensus round & cycle limitations) (plot for increasing data sizes)
Space used (plot for increasing data sizes)
Cycle usage (plot for increasing data sizes)

C-B-Elite · May 17, 2022, 12:28am

I didn’t see this lib before, but check about this

jzxchiang · May 17, 2022, 1:27am

Map Iteration: Map (and update) values in a key range.

This would be awesome! Really important for pagination and not currently part of RBTree.

It would also be great if you could talk with @dieter.sommer about the stable Rust data structure that his team is building for the Bitcoin integration.

I believe it directly stores bytes into stable memory. A Motoko equivalent using the ExperimentalStableMemory library would be very useful (as opposed to just declaring a data structure as a stable variable, which runs into the 4 GB heap limit).

icme · May 17, 2022, 2:26am

One of the drawbacks of storing this in stable memory (as opposed to using the stable keyword) would be additional performance overhead for various operations.

One of the goals of this library is to improve upon the performance of the Red-Black tree in terms of operational performance (speed, throughput, space occupied, and cost).

Hopefully we’ll be at a place in a few months with multi-canister scaling such that developers don’t need to worry as much about the 4GB individual canister size limit and are optimizing for compute, not storage.

skilesare · May 17, 2022, 3:30am

Other operations that would be nice:

split() - being able to split the tree in half with the lower half on the left and the higher half on the right
slice(n) - Like split but take n elements instead of half

It would also be very helpful to have a parallel implementation that is crypto aware
witness(id) - Calculate the witness of the leaf
root() - get the root hash of the tree.

quint · May 17, 2022, 7:16am

You made the library?

It seems that it uses the mo:base/Trie behind the scenes?

C-B-Elite · May 17, 2022, 8:51am

No, the bucket lib is a lib for data stable storing and geting

matthewhammer · May 17, 2022, 3:43pm

Thanks!

Yes, I was also thinking we’d start by writing a data structure to be stored in stable vars.

But I also plan to keep in mind that we eventually want a version of a B Tree in Motoko using stable memory more directly, either through the existing library API (quite slow), or more likely, with additional system and language support for improving the currently slow performance of using the system API (what the library does under the hood). I believe such support is in the works for all Wasm canisters, not just Motoko-based ones.

Great to know, thanks!

I wonder how it compares to this one? (also linked above)

In any case, I am interested to do that. Thanks again!

Makes sense!

Quick check: As this would be imperative, the split operations would only make sense if the original B-Tree is mutated (“destroyed”) in the process, right? (I just want to confirm that we aren’t imaging a functional version where the pre and post split versions co-exist.)

Yes, good point. This aspect is important too. (WDYT @icme?)

Is the basic (non-Merkle) version still useful to you @skilesare?

To manage the work, I would do the basic operations and tests first, and move on to “authenticated” use cases after that, where query operations produce witnesses related to the root hash, and update operations update it.

And for performance, I wonder how much extra cycles the Merklized hashing will require? I assume it will be sizable, and some folks may prefer not to do it at all. But more experiments will confirm the overhead, and they seem important to answer other questions (e.g., how to best design the authenticated-ness as an optional feature?) .

skilesare · May 17, 2022, 5:40pm

Quick check: As this would be imperative, the split operations would only make sense if the original B-Tree is mutated (“destroyed”) in the process, right? (I just want to confirm that we aren’t imaging a functional version where the pre and post-split versions co-exist.)

You just went over my head. If it is impossible then it is impossible(Confession: I don’t know what imperative means ). My functional programming is a bit sparse.

Yes! But less useful for returning certified data to HTTP queries.

paulyoung · May 17, 2022, 6:31pm

Don’t mean to derail, but curious about using StableBTreeMap vs the approach taken here:

github.com

dfinity/cdk-rs/blob/d056a9cc8a0eee4a2adda2dd60285d0c8ee1061c/examples/asset_storage/src/asset_storage_rs/lib.rs

use ic_cdk::{
    api::call::ManualReply,
    export::Principal,
    storage,
};
use ic_cdk_macros::*;
use std::cell::RefCell;
use std::collections::{BTreeMap, BTreeSet};

type Users = BTreeSet<Principal>;
type Store = BTreeMap<String, Vec<u8>>;

thread_local! {
    static USERS: RefCell<Users> = RefCell::default();
    static STORE: RefCell<Store> = RefCell::default();
}

#[init]
fn init() {
    USERS.with(|users| users.borrow_mut().insert(ic_cdk::api::caller()));

This file has been truncated. show original

Wouldn’t the overhead only be during upgrades?

matthewhammer · May 17, 2022, 8:40pm

Oops, sorry – let me try to be more concrete.

Suppose that t is a B-Tree that holds some key-value data.

Suppose that you define let (t1, t2) = t.splitAt(splitKey), so that t1 and t2 are each given the left and right portions of data, around splitKey (defined somehow).

There are two ways to implement this regarding how t1 and t2 relate to t:

Functional: t is still valid after the operation, and t1.append(t2) == t
Imperative: t is not valid after the operation; the nodes it uses have been mutated to represent t1 and t2

I think both are possible, and was wondering which you expected?

skilesare · May 17, 2022, 8:47pm

I don’t see much difference for me. I’d say do the one that is the most efficient for cycle purposes.

icme · May 17, 2022, 8:55pm

The imperative version will most likely be more efficient in terms of cycles and memory usage. The drawback is then as a programmer you have to be aware that this operation directly mutates the data structure passed into it, so you may have side effects that you did not originally intend to have happen.

For the purposes of permanent data storage, the imperative version is fine.

However, if you are splitting the data structure just to perform an operation on both haves, this will mutate the underlying data structure. In this case, a map/updateIter/Scan type of operation between two bounds would be best. Something like a combination of the following functions:

getMidpoint(bt, compareTo)
mapIter/Scan(bt, compareTo, lowerBound, upperBound, mapFunc) // midpoint as lowerbound or upperbound

icme · May 17, 2022, 11:23pm

I don’t have a strong opinion on a crypto awareness when it comes to data structures.

Unless it fundamentally changes how the data is stored, it might be nice to build the data structure out first, and then build wrappers around this base data structure that include that added functionality.

I’m not too familiar with merkle trees and the witness() and root() functionalities, but I’d prefer to keep this BTree data structure library as general purpose and lean as possible.

@matthewhammer Are you familiar with the witness() and root() functions and what that would take?

matthewhammer · May 17, 2022, 11:29pm

Perhaps someone who has touched that code can comment more (@chenyan ?), but my first guess is that they indeed do look similar in purpose (the asset one uses text keys, as opposed to byte strings, but that seems minor). The non-asset-specific one also seems to be engineered differently, in terms of its expression as Rust code. Without comparing them in a lot more detail, it’s hard to say more.

In my understanding*, the overhead of using the ExperimentalStableMemory module comes from the way that the system API is exposed to the canister, where it imposes an overhead for every read and write in that library API.

By contrast, using the stable keyword affects upgrade behavior (persisting the data in these vars using stable memory, and but incurring much less overhead during upgrade, due to “bulk operations” used then). Outside of upgrades, stable data structures in Motoko use ordinary memory read and writes, happening in the ordinary Wasm memory, and do not incur any system call overhead.

(* @claudio @ggreif please correct me if that’s inaccurate!)

matthewhammer · May 17, 2022, 11:38pm

Yes, a layered approach, with nice factoring, would be ideal. That would be my initial design goal, when the time comes for it.

The need for this comes from needing to relate computations done outside of consensus with those done as part of consensus. For everything within an update message, consensus means that the IC itself is ensuring a level of authenticity (“tamper proof”!).

For query messages, however, only a single replica replies and in principle, that one machine could be compromised to be deceptive or faulty in some way. If we could generally certify these quickly (without paying for full consensus), then the technical need for per-application certification would be eased, or eliminated, AFAIK.

Until that time, the role of witnesses and root hashes guards against this possibility of deception for such query responses, since the caller can check this response against a hash that has been processed by consensus.

To get a better sense of how this looks in Motoko, there’s an initial example for a single counter value. Ideally, some BTree (like this one we are discussing now) would generalize this example, certifying arbitrary sets of key-value data, not just a single 32-bit value (as the example now shows).

matthewhammer · May 18, 2022, 2:31am

PS: For anyone interested in diving into that code example, I recommend reading this comment that explains the steps briefly.

skilesare · May 18, 2022, 3:06am

Here is an old example that I did that (I think) is right and a way to do a real tree: Motoko Playground - DFINITY. I’m pretty sure I borrowed liberally from someone but I don’t remember who now.

jzxchiang · May 18, 2022, 6:49am

What’s strange is that @paulyoung’s example uses the ic_cdk storage API to read/write to stable memory, but StableBTreeMap doesn’t.

Can canisters even use that data structure then? It appears to me that it relies on replica internals, which makes it replica-specific and not canister-general. I’d like to be wrong though.

By contrast, using the stable keyword affects upgrade behavior (persisting the data in these vars using stable memory, and but incurring much less overhead during upgrade, due to “bulk operations” used then). Outside of upgrades, stable data structures in Motoko use ordinary memory read and writes, happening in the ordinary Wasm memory, and do not incur any system call overhead.

This sounds correct to me, but there is one huge drawback to Motoko stable variables: they are limited by the size of ordinary wasm memory, which is 4 GB (and in practice much less due to GC).

Compare this to a stable data structure in the mold of StableBTreeMap, which can access the entire 8 GB (and soon to be increased) stable memory address space.

I think @icme’s point about multi-canister scaling is valid, but I think any data structure developed from here onwards should be “multi-canister-aware”. Would an eventual (or even current) BigMap replace this? Developers don’t want to think about storage restrictions. They just want to read and write data to a data structure that abstracts all of that away. In 2022, I think any newly developed data structure that aims to be scalable should try to be as forward compatible as possible in light of all the changes going on.

With respect to Merkle trees, witnesses, and root hashes, I think it’s a great idea to make this library at least optionally “crypto-aware”. Certified variables are an incredibly powerful feature of the IC, and right now the biggest barrier to more widespread adoption of certified variables is a lack of simple libraries. Perhaps @nomeata could chime in on whether this would be the right place to do it.

matthewhammer · May 18, 2022, 5:33pm

Yep, I recognize that code. It’s from here:

And while it’s a perfectly reasonable starting place for a merklized KV store, it will not work in general, as an attacker that can choose put keys can choose these keys to create arbitrary tree depths, and arbitrarily bad future put/get performance.

OTOH, if the user that chooses put keys is trusted, or if they are always first hashed before being put into the tree, these concerns about balance are not as salient or security critical.

But if we care about security enough to use certification at all, we should also care about this possibility of the attacker controlling put keys too, as it will be severely problematic for data structures that do not treat these choices as potentially adversarial (DoS attack one path in the tree, and future requests there create canister paralysis, basically).

The extra complexity of a BTree (or other self-balancing tree) is that these problematic tree shapes cannot ever happen, due to rebalancing steps that this simpler approach totally lacks.

In a BTree (or RBTree, or other rebalancing tree), the attacker still chooses put keys that are ordered and that determine tree shape, but the tree responds by changing shape when a subtree becomes too deep.

The Trie from base solves the problem differently, and does not use ordered keys, but rather their hashes (not controlled by the attacker at all, unless the hash function is broken).

Topic		Replies	Views
Should I use HashMap or RBTree? Motoko	8	799	May 11, 2023
Motoko rbtree questions Developers	14	153	June 13, 2025
New ordered map/set for Motoko's base library Showcase Functional-Programming	3	186	December 3, 2024
Map v8.0.0, it's finally here Language Support Motoko	22	2175	January 6, 2024
New Vector data structure in Motoko Language Support Motoko	9	713	July 31, 2024

Motoko B-Tree, for efficient ordered key-value data

Related topics