Motoko B-Tree, for efficient ordered key-value data

Yes, but this is not fundamental, it’s specific to the current GC approach, and there is more than one at the moment.

Deterministic time slicing for the IC (a feature on the roadmap now) will positively impact this concern by permitting a non-copying GC to run as long as it needs to do a global collection. By eschewing a copying collector in favor of one without from/to space distinctions, we can get closer to the “real” platform limits.

I think I understand why one would want this – I use Nat because I do not want to think about my numbers “overflowing”. I would want my data structures to be the same on the IC, by using more canisters.

However, canisters are so complex to manage as a dynamic pool (cycle budgets, async interactions, upgrades, etc) that it seems like there is still some role for traditional, uni-canister data structures, at least until we have better language tech integrated into Motoko for managing such dynamic pools that underlie dynamic data structures. That’s my feeling now, at least.

I had not considered this…good to know. It seems like maybe hashing is always a good(if expensive) way to deal with things like this.

What kind of trade off are there to using something like a Merkel-patricia tree where the virtual space/structure is all pre allocated according to the key and reorgs aren’t needed. I’d imagine reorgs with certified data would be a nightmare of calculations depending on how wholistic the reorg is.

I share that concern. It’s certainly not very “incremental” to rebalance the tree in stable memory frequently. Being incremental there seems ideal, if possible.

The need for balancing goes with the need for order-based keys, chosen by external users. I suspect that this use case is what @icme envisions, and why someone would use a RBTree or a BTree (balanced trees, with order-based keys).

The Trie in base does not accept order based keys and it may be closer to what you are asking about. It uses a simplistic, binary variant of a patricia tree, and the binary digits determine the tree shape, not some balancing algorithm. But if the attacker chooses these binary strings, the same DoS potential exists, where they fill up narrow ranges that they choose and then query them (up to some limit on each key’s bit length, if any).

OTOH, if the canister hashes every key choice and uses these hashes as the binary strings, then no re-balancing is ever needed, and no DoS attack is possible.

But the issue there is that @icme wants to do range-based queries, where keys in a range are actually located near each other in the tree structure. That way, putting and getting a range of 10, 100 or 1000 keys is more localized, and more efficient. The BTree fits that need.

2 Likes