persistent actor Test {
public type Value = {
#Text : Text;
#Nat : Nat;
};
public query func check(_k: Text, _v: Value) : async { k: Nat; v: Nat; total: Nat } {
let e_k : Blob = to_candid(_k);
let e_v : Blob = to_candid(_v);
let k = e_k.size();
let v = e_v.size();
let total = k + v;
{ k; v; total };
};
};
Short answer: no, not really? to_candid is the wrong ruler for stable-heap size, and it’s also more expensive than you’d think…
Why it’s the wrong ruler?
You’re using persistent actor, which means enhanced orthogonal persistence: the heap is kept in place across upgrades, no Candid serialization happens. The number to_candid returns has essentially no relationship to what the value costs in stable memory.
Even under classical (non-persistent) stable variables, Motoko uses an extended Candid format on upgrade, not plain Candid. So to_candid is only a ballpark, not exact.
Each to_candid call also re-emits the 8-byte DIDL header and a fresh type table, so summing k + v double-counts overhead. to_candid((_k, _v)).size() is closer to “encoded together.”
Standalone size != marginal cost in context: type-table reuse and sharing mean adding a value to a larger structure is usually cheaper than its isolated encoding.
And it’s not free!
to_candid lowers to the runtime serializer and actually allocates the full Blob just so .size() can read its length. For a query that only wants a number, you’re paying a full encode + copy.
Better options:
If you want real storage accounting, measure the canister directly: Prim.rts_memory_size() / Prim.rts_heap_size() before and after the insert tells you the truth, including GC and sharing effects.
If your Value shape is fixed, compute the size analytically: Text/Blob are leb128(len) + len bytes of payload, Nat is LEB128, variants add a tag byte. No allocation, exact for the payload.
If you only need a quota/limit check (not exact bytes), Text.size / Blob.size on the inputs is usually enough and avoids the encoder entirely.
Use to_candid().size() only if you specifically need “what would this look like on the wire as a Candid argument”, which is a different question from “what does this cost in my stable heap.”
Before going further, can I ask what you’re actually trying to accomplish with this number? “Size of a value” can mean a few different things and the right answer depends a lot on which one you want:
“Am I about to blow my canister’s heap budget?”: a real storage/capacity check.
“I want to charge users a quota based on what they store”: a billing/limit check, where the exact bytes matter less than being fair and predictable.
“I want to know what this will cost on the wire when someone calls my canister”: a Candid message-size question.
“I’m debugging / curious how big things get”: exploratory.
I’m asking because the sizeOf you’re drafting is trying to model the in-heap layout of Motoko values, and that’s the one option I’d actively steer you away from, regardless of which goal you had in mind.
Why hand-rolling heap sizes might not be a good idea:
Motoko’s in-heap layout isn’t a stable public contract. It varies by word size (EOP is 64-bit, so headers and pointers are 8 bytes), by whether a value is boxed or unboxed in a given context, and by compiler version. A future release can change any of these without notice.
Text in particular is not “length + bytes” on the heap, it’s a “rope”. Concatenations share subtrees, so the same logical string can occupy very different amounts of memory depending on how it was built.
Nat / Int are arbitrary-precision bignums: header + limbs, not LEB128. LEB128.toUnsignedBytes(n).size() is the Candid wire encoding, which is a completely different thing from the heap cost. Mixing it into a heap-size formula gives you a number, but not a meaningful one.
Small scalars (Nat8..Nat64, Int8..Int64, Char, Bool) aren’t all the same size, and some are unboxed into the containing object’s slot rather than allocated separately. So 8 isn’t a safe universal answer, and your ? marks are the right instinct, there isn’t a clean number to fill in.
Variant tags aren’t “+ 1” either. A tag takes a whole word and gets folded into the variant’s object header, so adding 1 at the end isn’t modelling anything real.
Even if you pin down today’s numbers by experiment, you’re signing up to re-derive them on every compiler bump, and you’ll still be wrong about Text sharing.
What I’d actually reach for, depending on the goal:
Real storage accounting: don’t compute it, measure it. Prim.rts_heap_size() (or rts_memory_size()) before and after the insert is the ground truth, and it’s the only thing that correctly accounts for sharing, alignment, boxing, and GC. One subtraction, no table to maintain.
User-facing quota: don’t try to mirror the heap at all. Charge the logical payload and add a fixed per-entry overhead you define:
func sizeOf(v : Value) : Nat = switch v {
case (#Text t) { Text.encodeUtf8(t).size() };
case (#Blob b) { b.size() };
case (#Principal p) { Principal.toBlob(p).size() };
case (#Nat _) { 16 }; // pick a conservative flat cost
case (#Int _) { 16 };
case _ { 8 }; // fixed-width scalars
};
Then sizeOf(v) + C where C is your own bookkeeping constant per entry. The numbers are arbitrary but they’re yours: stable across compiler versions, easy to document, and easy for users hitting the limit to reason about.
Candid wire size: that’s the one case where to_candid((k, v)).size() actually answers the question you’re asking, with the caveat from my earlier reply that it isn’t free.
The thing to avoid is the middle path: a formula that looks like it’s computing the heap cost but is really a mix of Candid encoding, guessed object headers, and wire-format LEB128. That gives you a number that feels authoritative and isn’t.