Champ Map for Motoko - A Faster/Cheaper/Scalable Map

Introducing ChampMap — a persistent hash map for Motoko that scales past the 4M wall

Hi everyone,

We just published champ-map to mops — a new persistent / functional hash map for Motoko, built on the CHAMP (Compressed Hash-Array Mapped Prefix-tree) data structure. It is a drop-in option whenever you want hash-based O(1) lookups in stable canister state without giving up cheap snapshots or running into the rehash cliff that bites large maps today.

The library was funded by ICDevs.org (a 501c3 non-profit). If you find it useful in production, please consider chipping in at https://icdevs.org/donation.html or minting at https://g53ex-oqaaa-aaaab-ae5ua-cai.icp0.io/#/mint.

mops add champ-map

Repo / docs: https://github.com/icdevsorg/champ-map.mo


Why we built it

Most Motoko canisters that need a hash map reach for ZhenyaUsenko/motoko-hash-map, and for years it has been the right answer. But v9 — the current line — tops out around 4 million records and then can no longer grow within the IC’s per-message instruction budget. The next insert after that wall triggers a global rehash/resize step whose cost scales with the whole map, blows the budget, and traps the canister. There is no graceful recovery beyond offloading entries to another canister.(Sorry Bob :grimacing:)

Need to go beyond 4M record? CHAMP does exactly that. It is a hash trie that consumes 5 bits of hash per level, so it is at most 7 levels deep regardless of size, and every insert/delete path-copies only the nodes from the root to one leaf. There is no resize event — per-message cost stays flat as the map grows from thousands to tens of millions of entries.

There is one trade-off worth flagging up front: ChampMap does not preserve insertion order. mo:map threads a doubly-linked list through its buckets and can be used as a FIFO queue; ChampMap cannot. Iteration order is determined by the trie shape and is stable for a given key set, but it is not insertion order and not key order. If you need either, keep using mo:map (insertion) or mo:core/pure/Map (key order), or maintain a separate index alongside ChampMap.

Why it’s faster than mo:core/pure/Map

mo:core/pure/Map is a balanced red-black tree, so every lookup walks O(log n) levels — already 17 hops at 100K entries and 20 at 1M. ChampMap’s trie is fixed-depth, so every lookup touches at most ~7 nodes no matter how big the map gets. There is no rebalancing, the per-node layout is two compressed arrays indexed by a popcount bitmap (cache-friendly), and maps with ≤ 16 entries skip hashing entirely and use a flat array.

Concretely, measured with mops bench --replica pocket-ic:

Op 100K entries ChampMap vs mo:core/pure/Map
get hash lookup vs tree walk ~46% fewer instructions
iterate flat arrays vs tree traversal ~2× faster across all sizes
clone structural identity copy O(1) — free snapshots
build / replace path-copy with array splices uses fewer GC bytes; loses on raw instructions to RB-tree rotations (classic space-vs-time trade)

So: read-heavy workloads, snapshot-heavy workloads, and workloads that need to scale past the rehash cliff are the sweet spot. Write-heavy small maps are basically a wash.

How it handles hash collisions

Hashes are Nat32, so true 32-bit collisions are possible. Once the trie has consumed all 32 bits and two distinct keys still collide, ChampMap drops in a #collision(hash, entries) node — a flat bucket scanned linearly using the user-supplied areEqual. The bucket collapses back to an inline trie entry as soon as it shrinks to one element. For well-distributed hashes (the built-in HashUtils for Nat, Text, Blob, Principal, etc. all qualify), collision buckets are vanishingly rare and never grow beyond a couple of entries. They exist only to make correctness independent of hash quality.

Quick start

import CM "mo:champ-map";

let { nhash } = CM;            // HashUtils<Nat>

var m = CM.empty<Nat, Text>();
m := CM.put(m, nhash, 1, "hello");
m := CM.put(m, nhash, 2, "world");

assert CM.get(m, nhash, 1) == ?"hello";
assert CM.size(m) == 2;

// Clone is O(1) — structural sharing
let snapshot = CM.clone(m);
m := CM.remove(m, nhash, 1);
assert CM.size(m) == 1;
assert CM.size(snapshot) == 2; // unchanged

Built-in HashUtils are provided for Nat, Nat8Nat64, Int, Int8Int64, Text, Blob, Principal, and Bool, plus a combineHash combinator for composite keys. The full API mirrors what you’d expect from a modern Motoko collection: get / has / find, put / swap / replace / update, remove / delete, entries / keys / vals / forEach, map / filter / mapFilter, foldLeft / foldRight / all / any, fromIter / toArray / equal. There is also a collectBatch helper for safely chunking iteration when you might brush up against a per-message instruction limit.

Security considerations (please read this section)

ChampMap is not safe by default against adversarial keys. If your keys come from untrusted callers — principals, user-supplied text, opaque blobs from a public method, composite keys involving any of the above — an attacker who knows the hash function can pre-compute keys that all collide into one bucket. Lookups against a polluted map degrade from O(1) to O(N), and a single fat bucket can blow the per-message budget. This applies to every hash map, not just this one — but we’d rather call it out loudly than have it bite somebody.

Two mitigations that ship in the library:

  1. withSeed<K>(seed, hashUtils) — wraps any HashUtils with a per-instance seed (canister-local randomness, secret nonce, etc.) so an attacker cannot pre-compute collisions. Honest workloads pay nothing. Note: Unless you are on a TEE subnet a node provider could get your seed…be mindful…and populate from IC Randomness upon canister install.
    let seeded = CM.withSeed<Principal>(perCanisterSeed, CM.phash);
    var balances = CM.empty<Principal, Nat>();
    balances := CM.put(balances, seeded, caller, amount);
    
  2. validate<K, V>(map, hashUtils) : {#ok; #err : Text} — verifies all structural invariants of a Map<K, V> value. Because Map<K, V> is a public structural type, a candid caller could in principle synthesise a malformed #arrayMap with > 16 entries or a #branch whose bitmaps disagree with its arrays. Never accept Map<K, V> directly across a trust boundary — round-trip through [(K, V)] + fromIter, or call validate first and reject #err.

combineHash was also rewritten to use a real xxHash-style mixer so (a, b) and (b, a) no longer share a hash trivially.

What’s next

Right now champ-map is mops add champ-map away. If you try it and have feedback — API gaps, benchmarks on your workload, security concerns we haven’t thought of — please open an issue on the repo or reply here. We’d especially like to hear from teams that are sitting near the 4M wall today and would consider migrating.

We are also eyeing an indexed map that would allow for iteration along user-defined dimensions.

Thanks to @ferMartz for adding a skills file: champ-map.mo/skills/champ-map/SKILL.md at main · icdevsorg/champ-map.mo · GitHub

Yeah, I did some testing with champ-map and it’s fast and flexible enough for most key-value projects. Then I built the skill to port the core map over to champ-map using Claude Code for an IC project I’ve been working on recently. It should be general enough for any agent to build key-value–based structures on top of champ-map

If you notice anything that could be improved, feel free to open a PR. On my end it worked flawlessly.

Very nice! Love to see one of my favourite data structures.

I have a couple questions/suggestions:

I did not know about HAMT! I added benchmarks against HAMT to the bench and read me.

I also added some language around safety and how to use the library to keep it functional. Ideally, we’d make access to the structures more opaque, but I’ll have to leave that off for another day, and I’m not sure how the wrappers would affect performance(likely not very much).

Persistence/immutability guarantees apply when the map is used through ChampMap’s public functions such as empty, put, get, remove, swap, entries, and clone. The exported Map<K, V> representation is currently structural and contains mutable arrays internally, so callers should treat it as an opaque value and should not construct, destructure, or mutate raw map values directly.

I don’t expect people to be mucking around with the core structures enough, but I’ve added explicit language that you should NOT DO THAT. :grimacing:

I had a use case where I needed a structure where there were likely to be lots of small maps and then a few instances where there might be some very big maps. This library cheats with the up-to-16-sized maps, where the hashing and structure is just overhead and doesn’t start with the trie until you get over 16 items. The arrays come in at the beginning with the small maps and then at the end if you end up with a bunch of collisions on the hash, which is more likely with our compressed bits and 32 bit hashes, but with proper key control is still really, really unlikely.

The HAMT is likely a great choice for most applications and avoids the progressive cycle eating of core:map’s comparisons as the trie grows. Champ map is faster/fewer cycles across the board(except against core:map for heavy rewrites…but then core:map has a ton of garbage collection) but that comes at the cost of complexity and potential bugs/weirdness that we’ll only uncover through usage.

Nat keys: instructions

CM 10 CM 16 CM 100 CM 1_000 CM 10_000 CM 100_000 HAMT 10 HAMT 16 HAMT 100 HAMT 1_000 HAMT 10_000 HAMT 100_000 Core 10 Core 16 Core 100 Core 1_000 Core 10_000 Core 100_000
build 20_141 34_452 305_582 4_894_162 70_280_392 895_007_469 35_526 54_502 551_466 8_706_450 125_208_922 1_569_449_661 25_727 36_845 253_224 3_669_920 48_897_464 614_445_464
get 15_185 22_578 70_010 729_195 8_194_105 94_538_295 25_129 32_737 161_081 1_726_014 19_974_409 213_567_590 17_396 20_352 84_064 1_098_877 13_960_809 171_068_100
replace 25_008 46_123 375_543 5_198_307 77_183_573 986_745_810 42_992 69_322 776_179 10_069_447 147_874_471 1_718_983_405 25_356 33_704 199_785 2_728_970 34_557_125 421_503_259
delete 19_066 30_194 325_552 5_279_579 75_056_403 955_608_135 39_288 59_747 564_434 9_160_049 130_545_278 1_628_971_192 24_394 33_155 211_264 2_953_196 39_215_991 498_707_709
clone 9_387 9_175 9_084 9_741 8_455 6_923 13_691 13_479 12_095 11_552 9_854 7_889 13_307 13_095 11_711 11_168 9_470 7_505
iterate 11_091 11_683 44_437 337_843 3_013_868 32_654_710 25_387 32_618 153_538 1_263_021 14_311_550 127_609_045 21_192 25_533 87_043 761_423 7_510_393 75_014_689

Nat keys: garbage collection

CM 10 CM 16 CM 100 CM 1_000 CM 10_000 CM 100_000 HAMT 10 HAMT 16 HAMT 100 HAMT 1_000 HAMT 10_000 HAMT 100_000 Core 10 Core 16 Core 100 Core 1_000 Core 10_000 Core 100_000
build 2.18 KiB 2.57 KiB 22.45 KiB 319.89 KiB 4.22 MiB 52.7 MiB 4.88 KiB 6.61 KiB 43.75 KiB 610.26 KiB 7.85 MiB 95.04 MiB 5.47 KiB 8 KiB 55.42 KiB 763.66 KiB 9.65 MiB 118.76 MiB
get 1.61 KiB 1.47 KiB 2.44 KiB 11.36 KiB 96.06 KiB 984.48 KiB 2.96 KiB 3.34 KiB 11.87 KiB 117.34 KiB 1.32 MiB 14.14 MiB 2.32 KiB 2.18 KiB 1.89 KiB 1.61 KiB 1.32 KiB 1.04 KiB
replace 2.35 KiB 3.04 KiB 26 KiB 332.32 KiB 4.52 MiB 56.72 MiB 5.38 KiB 7.67 KiB 58.68 KiB 705.27 KiB 9.28 MiB 105.18 MiB 4.96 KiB 6.8 KiB 42.21 KiB 573.75 KiB 7.26 MiB 88.29 MiB
delete 2.12 KiB 2.48 KiB 23.1 KiB 336.67 KiB 4.52 MiB 56.58 MiB 4.77 KiB 6.47 KiB 43.17 KiB 633.54 KiB 8.21 MiB 100.69 MiB 5.08 KiB 7.21 KiB 54.7 KiB 799.8 KiB 10.52 MiB 134.95 MiB
clone 1.61 KiB 1.47 KiB 1.47 KiB 1.47 KiB 1.19 KiB 924 B 2.18 KiB 2.04 KiB 1.76 KiB 1.47 KiB 1.19 KiB 924 B 2.32 KiB 2.18 KiB 1.89 KiB 1.61 KiB 1.32 KiB 1.04 KiB
iterate 1.66 KiB 1.52 KiB 3.97 KiB 18.04 KiB 158.38 KiB 1.53 MiB 2.98 KiB 3.26 KiB 9.43 KiB 71.63 KiB 760.94 KiB 6.92 MiB 3.36 KiB 3.81 KiB 11.73 KiB 99.34 KiB 977.96 KiB 9.54 MiB