Is it ok to use to_candid to check the size of data that might be stored in the canister's stable heap?

kayicp · April 14, 2026, 3:53am

for example

persistent actor Test {
  public type Value = {
    #Text : Text;
    #Nat : Nat; 
  };
  public query func check(_k: Text, _v: Value) : async { k: Nat; v: Nat; total: Nat } {
    let e_k : Blob = to_candid(_k);
    let e_v : Blob = to_candid(_v);
    let k = e_k.size();
    let v = e_v.size();
    let total = k + v; 
    { k; v; total };
  };
};

example inputs & outputs:

check("", variant {Nat=0}) -> (record {k=8; v=41; total=49})
check("hello", variant {Text="world"}) -> (record {k=13; v=46; total=59})

or is there a better way?

quint · April 14, 2026, 7:05am

Short answer: no, not really? to_candid is the wrong ruler for stable-heap size, and it’s also more expensive than you’d think…

Why it’s the wrong ruler?

You’re using persistent actor, which means enhanced orthogonal persistence: the heap is kept in place across upgrades, no Candid serialization happens. The number to_candid returns has essentially no relationship to what the value costs in stable memory.
Even under classical (non-persistent) stable variables, Motoko uses an extended Candid format on upgrade, not plain Candid. So to_candid is only a ballpark, not exact.
Each to_candid call also re-emits the 8-byte DIDL header and a fresh type table, so summing k + v double-counts overhead. to_candid((_k, _v)).size() is closer to “encoded together.”
Standalone size != marginal cost in context: type-table reuse and sharing mean adding a value to a larger structure is usually cheaper than its isolated encoding.

And it’s not free!

to_candid lowers to the runtime serializer and actually allocates the full Blob just so .size() can read its length. For a query that only wants a number, you’re paying a full encode + copy.

Better options:

If you want real storage accounting, measure the canister directly: Prim.rts_memory_size() / Prim.rts_heap_size() before and after the insert tells you the truth, including GC and sharing effects.
If your Value shape is fixed, compute the size analytically: Text/Blob are leb128(len) + len bytes of payload, Nat is LEB128, variants add a tag byte. No allocation, exact for the payload.
If you only need a quota/limit check (not exact bytes), Text.size / Blob.size on the inputs is usually enough and avoids the encoder entirely.

Use to_candid().size() only if you specifically need “what would this look like on the wire as a Candid argument”, which is a different question from “what does this cost in my stable heap.”

Hope this helps!

kayicp · April 14, 2026, 7:43am

hi thanks for the helpful reply.
i see so i just have to calculate the size manually.
may i ask if what i have below is correct?

func sizeOf(v: Value) : Nat {
  switch v {
    case (#Blob(b))  { 8 + b.size() + 3 };
    case (#Text(t))  { 8 + Text.encodeUtf8(t).size() + 3 };

    case (#Nat n)   { LEB128.toUnsignedBytes(n).size() };
    case (#Int i)   { LEB128.toSignedBytes(i).size() };
    
    case (#Char(_))  { 8 };

    case (#Nat8(_))    { 8 }; 
    case (#Nat16(_))   { 8 };
    case (#Nat32(_))   { 8 };
    case (#Nat64(_))   { 8 };
    
    case (#Int8(_))    { ? }; 
    case (#Int16(_))   { ? }; 
    case (#Int32(_))   { ? }; 
    case (#Int64(_))   { 12 };
    
    case (#Float(_))   { 8 }; 
    case (#Bool(_))    { ? }; 
    case (#Principal p) { 8 + Principal.toBlob(p).size() + 3 };
  }
};

and finally i just have to sizeOf(value) + 1 to plus the variant tag?

quint · April 14, 2026, 8:07am

Before going further, can I ask what you’re actually trying to accomplish with this number? “Size of a value” can mean a few different things and the right answer depends a lot on which one you want:

“Am I about to blow my canister’s heap budget?”: a real storage/capacity check.
“I want to charge users a quota based on what they store”: a billing/limit check, where the exact bytes matter less than being fair and predictable.
“I want to know what this will cost on the wire when someone calls my canister”: a Candid message-size question.
“I’m debugging / curious how big things get”: exploratory.

I’m asking because the sizeOf you’re drafting is trying to model the in-heap layout of Motoko values, and that’s the one option I’d actively steer you away from, regardless of which goal you had in mind.

quint · April 14, 2026, 8:13am

Why hand-rolling heap sizes might not be a good idea:

Motoko’s in-heap layout isn’t a stable public contract. It varies by word size (EOP is 64-bit, so headers and pointers are 8 bytes), by whether a value is boxed or unboxed in a given context, and by compiler version. A future release can change any of these without notice.
Text in particular is not “length + bytes” on the heap, it’s a “rope”. Concatenations share subtrees, so the same logical string can occupy very different amounts of memory depending on how it was built.
Nat / Int are arbitrary-precision bignums: header + limbs, not LEB128. LEB128.toUnsignedBytes(n).size() is the Candid wire encoding, which is a completely different thing from the heap cost. Mixing it into a heap-size formula gives you a number, but not a meaningful one.
Small scalars (Nat8..Nat64, Int8..Int64, Char, Bool) aren’t all the same size, and some are unboxed into the containing object’s slot rather than allocated separately. So 8 isn’t a safe universal answer, and your ? marks are the right instinct, there isn’t a clean number to fill in.
Variant tags aren’t “+ 1” either. A tag takes a whole word and gets folded into the variant’s object header, so adding 1 at the end isn’t modelling anything real.

Even if you pin down today’s numbers by experiment, you’re signing up to re-derive them on every compiler bump, and you’ll still be wrong about Text sharing.

What I’d actually reach for, depending on the goal:

Real storage accounting: don’t compute it, measure it. Prim.rts_heap_size() (or rts_memory_size()) before and after the insert is the ground truth, and it’s the only thing that correctly accounts for sharing, alignment, boxing, and GC. One subtraction, no table to maintain.
User-facing quota: don’t try to mirror the heap at all. Charge the logical payload and add a fixed per-entry overhead you define:

func sizeOf(v : Value) : Nat = switch v {
  case (#Text t)       { Text.encodeUtf8(t).size() };
  case (#Blob b)       { b.size() };
  case (#Principal p)  { Principal.toBlob(p).size() };
  case (#Nat _)        { 16 };  // pick a conservative flat cost
  case (#Int _)        { 16 };
  case _               { 8 };   // fixed-width scalars
};

Then sizeOf(v) + C where C is your own bookkeeping constant per entry. The numbers are arbitrary but they’re yours: stable across compiler versions, easy to document, and easy for users hitting the limit to reason about.

Candid wire size: that’s the one case where to_candid((k, v)).size() actually answers the question you’re asking, with the caveat from my earlier reply that it isn’t free.

The thing to avoid is the middle path: a formula that looks like it’s computing the heap cost but is really a mix of Candid encoding, guessed object headers, and wire-format LEB128. That gives you a number that feels authoritative and isn’t.

kayicp · April 14, 2026, 8:29am

hi yes im actually building a shared storage for users so i want to charge them based on what they’re storing

then i guess i will go with your user-facing quota strategy

thanks a lot

Topic		Replies	Views
Easiest way to know when canister size gets to certain size Getting Started	10	1126	July 9, 2024
Checking if canister's storage is 99% used Developers	8	350	March 25, 2024
Two questions about canister storage Developers	30	3663	March 9, 2022
Canister Memory Management & Data Memory Requirements Developers	1	66	April 25, 2025
Candid, to_candid, motoko assumptions Developers candid	4	666	March 2, 2023

Is it ok to use to_candid to check the size of data that might be stored in the canister's stable heap?

Related topics