Proposal: support native Nat64 indexing in Motoko
I wanted to share some benchmark results and make a case for supporting native Nat64 indexing in Motoko.
The practical issue is straightforward: today, indexed access goes through Nat, so when code naturally operates in Nat64, you end up paying a conversion cost just to perform the lookup. In many workloads that is fine, but in highly optimized paths it becomes noticeable, and it seems unnecessary given that the loop/index variable is already a machine-sized integer.
What pushed us to look at this is the fact that, in practice, no array anywhere near the size of max Nat64 could be allocated in a single max-DTS round anyway. With the current per-round instruction limits, this is not really about wanting impossibly large arrays. It is about wanting a native 64-bit counter/index for tight loops and large logical address spaces without having to convert back to Nat just to satisfy the type of [].
We tested three things:
- normal
Natindexing into a var array Nat64loop counters withNat64.toNat()for array indexing- blob-backed alternatives, where data is addressed through offsets and reconstructed from bytes
The blob-backed path was something we explored only as a possible alternative object layout. We were not especially interested in blob-based recovery itself. The idea was simply to see whether loading data from a blob via Nat64 offsets might be cheap enough to justify an alternate representation. It wasnβt.
Benchmark summary
For array indexing:
nat_index_vararray: 59,791 instructions for 10 reads, 587,293 for 100 readsnat64_toNat_index_vararray: 64,096 instructions for 10 reads, 626,968 for 100 reads
So converting Nat64 -> Nat on each indexed access is measurably more expensive than native Nat indexing.
For looping alone:
nat_loop_counter: 62,133 / 605,475nat64_loop_counter: 23,861 / 220,163
This is the most interesting part to me. Iterating with Nat64 is dramatically cheaper than iterating with Nat.
And when isolating the conversion cost:
nat64_toNat_once_per_iter: 50,015 / 476,717
That strongly suggests the current indexing model leaves performance on the table in cases where the natural representation is already Nat64.
The blob alternatives were much worse:
blob_backed_nat8_get: 575,743 / 5,744,365blob_backed_nat64_get: 521,051 / 5,193,268
So while random access to blobs is possible, it is not an attractive substitute here. Also, blob random access still goes through Nat, not Nat64, so it does not solve the underlying ergonomics/performance issue anyway.
Why native Nat64 indexing would help
This seems like a good case for allowing indexing APIs to work natively with Nat64 (or more generally fixed-width integer indices where appropriate):
Nat64loops are clearly cheaper- converting to
Natpurely for indexing adds overhead - blob-backed workarounds are much more expensive
- the usual objection that
Nat64implies absurdly large allocatable arrays does not really apply in practice, because allocation is already constrained far below that by execution limits
In other words, this is not really a request for βarrays of size 2^64β. It is a request to let optimized code use the most efficient counter/index type end-to-end, without paying a conversion tax at the final array access.
What I think would be useful
At minimum, it would be nice if Motoko supported native Nat64 indexing for arrays/blob-like structures where bounds checks can still be enforced normally.
Even if this were introduced only for certain collection types, or only at the compiler/runtime level where it can be lowered efficiently, it would remove the current mismatch:
- iteration is cheap with
Nat64- It looks like 3x faster than Nat if we could have a pure Nat64 index. - access requires
Nat - therefore optimized code has to bounce between the two
That feels like an unnecessary penalty.
Again, to be clear: we do not care much about the blob-based recovery angle here. We only tried it to see whether an object design based on loading from a blob via Nat64 offsets could be cheap enough to matter. The answer appears to be no. The more important takeaway is that Nat64 iteration is cheap, and it would be useful if indexing could accept Nat64 natively so we do not have to convert to Nat just to make the reference work.
Curious whether others have run into the same issue, or whether native fixed-width indexing has already been discussed as a compiler/runtime optimization opportunity.
Bench code:
import Bench "mo:bench";
import Nat64 "mo:core/Nat64";
import Prim "mo:β";
module {
public func init() : Bench.Bench {
let bench = Bench.Bench();
bench.name("Nat64 Index Cost: toNat vs Blob-backed");
bench.description("Measure cost of Nat64.toNat() for array indexing vs alternatives");
bench.cols(["10_reads", "100_reads"]);
bench.rows([
"nat_index_vararray",
"nat64_toNat_index_vararray",
"blob_backed_nat8_get",
"blob_backed_nat64_get",
"nat_loop_counter",
"nat64_loop_counter",
"nat64_toNat_once_per_iter",
]);
// Setup: a 64-element VarArray of Nat64
let arrSize = 64;
let arr : [var Nat64] = Prim.Array_init<Nat64>(arrSize, 0);
var k = 0;
while (k < arrSize) {
arr[k] := Prim.natToNat64(k * 0x12345678);
k += 1;
};
// Setup: Blob-backed storage (simulate 64 Nat64s = 512 bytes)
let blobSize = arrSize * 8;
let blobArr : [var Nat8] = Prim.Array_init<Nat8>(blobSize, 0 : Nat8);
var m = 0;
while (m < arrSize) {
let v = arr[m];
let base = m * 8;
let (b0, b1, b2, b3, b4, b5, b6, b7) = Nat64.explode(v);
blobArr[base] := b0;
blobArr[base + 1] := b1;
blobArr[base + 2] := b2;
blobArr[base + 3] := b3;
blobArr[base + 4] := b4;
blobArr[base + 5] := b5;
blobArr[base + 6] := b6;
blobArr[base + 7] := b7;
m += 1;
};
// Helper: read Nat64 from blob-backed array at index (Nat)
let n8to64 = func(v: Nat8) : Nat64 {
Prim.nat32ToNat64(Prim.nat16ToNat32(Prim.nat8ToNat16(v)));
};
let blobGetNat64 = func(idx: Nat) : Nat64 {
let base = idx * 8;
n8to64(blobArr[base]) |
n8to64(blobArr[base + 1]) << 8 |
n8to64(blobArr[base + 2]) << 16 |
n8to64(blobArr[base + 3]) << 24 |
n8to64(blobArr[base + 4]) << 32 |
n8to64(blobArr[base + 5]) << 40 |
n8to64(blobArr[base + 6]) << 48 |
n8to64(blobArr[base + 7]) << 56;
};
// Helper: read single Nat8 from blob array
let blobGetNat8 = func(idx: Nat) : Nat8 {
blobArr[idx];
};
var sink : Nat64 = 0;
var sinkNat : Nat = 0;
bench.runner(func(row, col) {
let iters : Nat = switch(col) {
case "10_reads" 10;
case "100_reads" 100;
case _ 10;
};
switch(row) {
// Baseline: Nat loop counter, index VarArray directly
case "nat_index_vararray" {
var acc : Nat64 = 0;
var n = 0;
var iter = 0;
while (iter < iters) {
n := 0;
while (n < arrSize) {
acc +%= arr[n];
n += 1;
};
iter += 1;
};
sink := acc;
};
// Key test: Nat64 counter β Nat64.toNat() each iteration for index
case "nat64_toNat_index_vararray" {
var acc : Nat64 = 0;
var n64 : Nat64 = 0;
var iter = 0;
while (iter < iters) {
n64 := 0;
while (Nat64.toNat(n64) < arrSize) {
acc +%= arr[Nat64.toNat(n64)];
n64 += 1;
};
iter += 1;
};
sink := acc;
};
// Blob-backed: read single Nat8 values (simulate byte-level access)
case "blob_backed_nat8_get" {
var acc : Nat64 = 0;
var n = 0;
var iter = 0;
while (iter < iters) {
n := 0;
while (n < blobSize) {
acc +%= n8to64(blobArr[n]);
n += 1;
};
iter += 1;
};
sink := acc;
};
// Blob-backed: reconstruct Nat64 from 8 bytes
case "blob_backed_nat64_get" {
var acc : Nat64 = 0;
var n = 0;
var iter = 0;
while (iter < iters) {
n := 0;
while (n < arrSize) {
acc +%= blobGetNat64(n);
n += 1;
};
iter += 1;
};
sink := acc;
};
// Nat loop counter baseline (just the counter cost)
case "nat_loop_counter" {
var acc : Nat = 0;
var iter = 0;
while (iter < iters) {
var n = 0;
while (n < 64) {
acc += n;
n += 1;
};
iter += 1;
};
sinkNat := acc;
};
// Nat64 loop counter (avoid toNat, compare as Nat64)
case "nat64_loop_counter" {
var acc : Nat64 = 0;
var iter = 0;
while (iter < iters) {
var n : Nat64 = 0;
while (n < 64) {
acc +%= n;
n += 1;
};
iter += 1;
};
sink := acc;
};
// Nat64 counter but call toNat once per iteration (measure overhead)
case "nat64_toNat_once_per_iter" {
var acc : Nat = 0;
var iter = 0;
while (iter < iters) {
var n : Nat64 = 0;
while (n < 64) {
acc += Nat64.toNat(n);
n += 1;
};
iter += 1;
};
sinkNat := acc;
};
case _ {};
};
});
bench;
};
};