Is it ok to use to_candid to check the size of data that might be stored in the canister's stable heap?

for example

persistent actor Test {
  public type Value = {
    #Text : Text;
    #Nat : Nat; 
  };
  public query func check(_k: Text, _v: Value) : async { k: Nat; v: Nat; total: Nat } {
    let e_k : Blob = to_candid(_k);
    let e_v : Blob = to_candid(_v);
    let k = e_k.size();
    let v = e_v.size();
    let total = k + v; 
    { k; v; total };
  };
};

example inputs & outputs:

check("", variant {Nat=0}) -> (record {k=8; v=41; total=49})
check("hello", variant {Text="world"}) -> (record {k=13; v=46; total=59})

or is there a better way?

Short answer: no, not really? to_candid is the wrong ruler for stable-heap size, and it’s also more expensive than you’d think…

Why it’s the wrong ruler?

  • You’re using persistent actor, which means enhanced orthogonal persistence: the heap is kept in place across upgrades, no Candid serialization happens. The number to_candid returns has essentially no relationship to what the value costs in stable memory.
  • Even under classical (non-persistent) stable variables, Motoko uses an extended Candid format on upgrade, not plain Candid. So to_candid is only a ballpark, not exact.
  • Each to_candid call also re-emits the 8-byte DIDL header and a fresh type table, so summing k + v double-counts overhead. to_candid((_k, _v)).size() is closer to “encoded together.”
  • Standalone size != marginal cost in context: type-table reuse and sharing mean adding a value to a larger structure is usually cheaper than its isolated encoding.

And it’s not free!

to_candid lowers to the runtime serializer and actually allocates the full Blob just so .size() can read its length. For a query that only wants a number, you’re paying a full encode + copy.

Better options:

  • If you want real storage accounting, measure the canister directly: Prim.rts_memory_size() / Prim.rts_heap_size() before and after the insert tells you the truth, including GC and sharing effects.
  • If your Value shape is fixed, compute the size analytically: Text/Blob are leb128(len) + len bytes of payload, Nat is LEB128, variants add a tag byte. No allocation, exact for the payload.
  • If you only need a quota/limit check (not exact bytes), Text.size / Blob.size on the inputs is usually enough and avoids the encoder entirely.

Use to_candid().size() only if you specifically need “what would this look like on the wire as a Candid argument”, which is a different question from “what does this cost in my stable heap.”

Hope this helps!

3 Likes

hi thanks for the helpful reply.
i see so i just have to calculate the size manually.
may i ask if what i have below is correct?

func sizeOf(v: Value) : Nat {
  switch v {
    case (#Blob(b))  { 8 + b.size() + 3 };
    case (#Text(t))  { 8 + Text.encodeUtf8(t).size() + 3 };

    case (#Nat n)   { LEB128.toUnsignedBytes(n).size() };
    case (#Int i)   { LEB128.toSignedBytes(i).size() };
    
    case (#Char(_))  { 8 };

    case (#Nat8(_))    { 8 }; 
    case (#Nat16(_))   { 8 };
    case (#Nat32(_))   { 8 };
    case (#Nat64(_))   { 8 };
    
    case (#Int8(_))    { ? }; 
    case (#Int16(_))   { ? }; 
    case (#Int32(_))   { ? }; 
    case (#Int64(_))   { 12 };
    
    case (#Float(_))   { 8 }; 
    case (#Bool(_))    { ? }; 
    case (#Principal p) { 8 + Principal.toBlob(p).size() + 3 };
  }
};

and finally i just have to sizeOf(value) + 1 to plus the variant tag?

Before going further, can I ask what you’re actually trying to accomplish with this number? “Size of a value” can mean a few different things and the right answer depends a lot on which one you want:

  • “Am I about to blow my canister’s heap budget?”: a real storage/capacity check.
  • “I want to charge users a quota based on what they store”: a billing/limit check, where the exact bytes matter less than being fair and predictable.
  • “I want to know what this will cost on the wire when someone calls my canister”: a Candid message-size question.
  • “I’m debugging / curious how big things get”: exploratory.

I’m asking because the sizeOf you’re drafting is trying to model the in-heap layout of Motoko values, and that’s the one option I’d actively steer you away from, regardless of which goal you had in mind.

1 Like

Why hand-rolling heap sizes might not be a good idea:

  • Motoko’s in-heap layout isn’t a stable public contract. It varies by word size (EOP is 64-bit, so headers and pointers are 8 bytes), by whether a value is boxed or unboxed in a given context, and by compiler version. A future release can change any of these without notice.
  • Text in particular is not “length + bytes” on the heap, it’s a “rope”. Concatenations share subtrees, so the same logical string can occupy very different amounts of memory depending on how it was built.
  • Nat / Int are arbitrary-precision bignums: header + limbs, not LEB128. LEB128.toUnsignedBytes(n).size() is the Candid wire encoding, which is a completely different thing from the heap cost. Mixing it into a heap-size formula gives you a number, but not a meaningful one.
  • Small scalars (Nat8..Nat64, Int8..Int64, Char, Bool) aren’t all the same size, and some are unboxed into the containing object’s slot rather than allocated separately. So 8 isn’t a safe universal answer, and your ? marks are the right instinct, there isn’t a clean number to fill in.
  • Variant tags aren’t “+ 1” either. A tag takes a whole word and gets folded into the variant’s object header, so adding 1 at the end isn’t modelling anything real.

Even if you pin down today’s numbers by experiment, you’re signing up to re-derive them on every compiler bump, and you’ll still be wrong about Text sharing.

What I’d actually reach for, depending on the goal:

  1. Real storage accounting: don’t compute it, measure it. Prim.rts_heap_size() (or rts_memory_size()) before and after the insert is the ground truth, and it’s the only thing that correctly accounts for sharing, alignment, boxing, and GC. One subtraction, no table to maintain.
  2. User-facing quota: don’t try to mirror the heap at all. Charge the logical payload and add a fixed per-entry overhead you define:
func sizeOf(v : Value) : Nat = switch v {
  case (#Text t)       { Text.encodeUtf8(t).size() };
  case (#Blob b)       { b.size() };
  case (#Principal p)  { Principal.toBlob(p).size() };
  case (#Nat _)        { 16 };  // pick a conservative flat cost
  case (#Int _)        { 16 };
  case _               { 8 };   // fixed-width scalars
};

Then sizeOf(v) + C where C is your own bookkeeping constant per entry. The numbers are arbitrary but they’re yours: stable across compiler versions, easy to document, and easy for users hitting the limit to reason about.

  1. Candid wire size: that’s the one case where to_candid((k, v)).size() actually answers the question you’re asking, with the caveat from my earlier reply that it isn’t free.

The thing to avoid is the middle path: a formula that looks like it’s computing the heap cost but is really a mix of Candid encoding, guessed object headers, and wire-format LEB128. That gives you a number that feels authoritative and isn’t.

1 Like

hi yes im actually building a shared storage for users so i want to charge them based on what they’re storing

then i guess i will go with your user-facing quota strategy

thanks a lot :grinning_cat:

1 Like