Should motoko arrays be indexed by Nat64? Or could we get Nat64 indexed Arrays?

skilesare · March 20, 2026, 10:51pm

Proposal: support native Nat64 indexing in Motoko

I wanted to share some benchmark results and make a case for supporting native Nat64 indexing in Motoko.

The practical issue is straightforward: today, indexed access goes through Nat, so when code naturally operates in Nat64, you end up paying a conversion cost just to perform the lookup. In many workloads that is fine, but in highly optimized paths it becomes noticeable, and it seems unnecessary given that the loop/index variable is already a machine-sized integer.

What pushed us to look at this is the fact that, in practice, no array anywhere near the size of max Nat64 could be allocated in a single max-DTS round anyway. With the current per-round instruction limits, this is not really about wanting impossibly large arrays. It is about wanting a native 64-bit counter/index for tight loops and large logical address spaces without having to convert back to Nat just to satisfy the type of [].

We tested three things:

normal Nat indexing into a var array
Nat64 loop counters with Nat64.toNat() for array indexing
blob-backed alternatives, where data is addressed through offsets and reconstructed from bytes

The blob-backed path was something we explored only as a possible alternative object layout. We were not especially interested in blob-based recovery itself. The idea was simply to see whether loading data from a blob via Nat64 offsets might be cheap enough to justify an alternate representation. It wasn’t.

Benchmark summary

For array indexing:

nat_index_vararray: 59,791 instructions for 10 reads, 587,293 for 100 reads
nat64_toNat_index_vararray: 64,096 instructions for 10 reads, 626,968 for 100 reads

So converting Nat64 -> Nat on each indexed access is measurably more expensive than native Nat indexing.

For looping alone:

nat_loop_counter: 62,133 / 605,475
nat64_loop_counter: 23,861 / 220,163

This is the most interesting part to me. Iterating with Nat64 is dramatically cheaper than iterating with Nat.

And when isolating the conversion cost:

nat64_toNat_once_per_iter: 50,015 / 476,717

That strongly suggests the current indexing model leaves performance on the table in cases where the natural representation is already Nat64.

The blob alternatives were much worse:

blob_backed_nat8_get: 575,743 / 5,744,365
blob_backed_nat64_get: 521,051 / 5,193,268

So while random access to blobs is possible, it is not an attractive substitute here. Also, blob random access still goes through Nat, not Nat64, so it does not solve the underlying ergonomics/performance issue anyway.

Why native `Nat64` indexing would help

This seems like a good case for allowing indexing APIs to work natively with Nat64 (or more generally fixed-width integer indices where appropriate):

Nat64 loops are clearly cheaper
converting to Nat purely for indexing adds overhead
blob-backed workarounds are much more expensive
the usual objection that Nat64 implies absurdly large allocatable arrays does not really apply in practice, because allocation is already constrained far below that by execution limits

In other words, this is not really a request for “arrays of size 2^64”. It is a request to let optimized code use the most efficient counter/index type end-to-end, without paying a conversion tax at the final array access.

What I think would be useful

At minimum, it would be nice if Motoko supported native Nat64 indexing for arrays/blob-like structures where bounds checks can still be enforced normally.

Even if this were introduced only for certain collection types, or only at the compiler/runtime level where it can be lowered efficiently, it would remove the current mismatch:

iteration is cheap with Nat64 - It looks like 3x faster than Nat if we could have a pure Nat64 index.
access requires Nat
therefore optimized code has to bounce between the two

That feels like an unnecessary penalty.

Again, to be clear: we do not care much about the blob-based recovery angle here. We only tried it to see whether an object design based on loading from a blob via Nat64 offsets could be cheap enough to matter. The answer appears to be no. The more important takeaway is that Nat64 iteration is cheap, and it would be useful if indexing could accept Nat64 natively so we do not have to convert to Nat just to make the reference work.

Curious whether others have run into the same issue, or whether native fixed-width indexing has already been discussed as a compiler/runtime optimization opportunity.

Bench code:

import Bench "mo:bench";
import Nat64 "mo:core/Nat64";
import Prim "mo:⛔";

module {
    public func init() : Bench.Bench {
        let bench = Bench.Bench();

        bench.name("Nat64 Index Cost: toNat vs Blob-backed");
        bench.description("Measure cost of Nat64.toNat() for array indexing vs alternatives");

        bench.cols(["10_reads", "100_reads"]);
        bench.rows([
            "nat_index_vararray",
            "nat64_toNat_index_vararray",
            "blob_backed_nat8_get",
            "blob_backed_nat64_get",
            "nat_loop_counter",
            "nat64_loop_counter",
            "nat64_toNat_once_per_iter",
        ]);

        // Setup: a 64-element VarArray of Nat64
        let arrSize = 64;
        let arr : [var Nat64] = Prim.Array_init<Nat64>(arrSize, 0);
        var k = 0;
        while (k < arrSize) {
            arr[k] := Prim.natToNat64(k * 0x12345678);
            k += 1;
        };

        // Setup: Blob-backed storage (simulate 64 Nat64s = 512 bytes)
        let blobSize = arrSize * 8;
        let blobArr : [var Nat8] = Prim.Array_init<Nat8>(blobSize, 0 : Nat8);
        var m = 0;
        while (m < arrSize) {
            let v = arr[m];
            let base = m * 8;
            let (b0, b1, b2, b3, b4, b5, b6, b7) = Nat64.explode(v);
            blobArr[base] := b0;
            blobArr[base + 1] := b1;
            blobArr[base + 2] := b2;
            blobArr[base + 3] := b3;
            blobArr[base + 4] := b4;
            blobArr[base + 5] := b5;
            blobArr[base + 6] := b6;
            blobArr[base + 7] := b7;
            m += 1;
        };

        // Helper: read Nat64 from blob-backed array at index (Nat)
        let n8to64 = func(v: Nat8) : Nat64 {
            Prim.nat32ToNat64(Prim.nat16ToNat32(Prim.nat8ToNat16(v)));
        };
        let blobGetNat64 = func(idx: Nat) : Nat64 {
            let base = idx * 8;
            n8to64(blobArr[base]) |
            n8to64(blobArr[base + 1]) << 8 |
            n8to64(blobArr[base + 2]) << 16 |
            n8to64(blobArr[base + 3]) << 24 |
            n8to64(blobArr[base + 4]) << 32 |
            n8to64(blobArr[base + 5]) << 40 |
            n8to64(blobArr[base + 6]) << 48 |
            n8to64(blobArr[base + 7]) << 56;
        };

        // Helper: read single Nat8 from blob array 
        let blobGetNat8 = func(idx: Nat) : Nat8 {
            blobArr[idx];
        };

        var sink : Nat64 = 0;
        var sinkNat : Nat = 0;

        bench.runner(func(row, col) {
            let iters : Nat = switch(col) {
                case "10_reads" 10;
                case "100_reads" 100;
                case _ 10;
            };

            switch(row) {

                // Baseline: Nat loop counter, index VarArray directly
                case "nat_index_vararray" {
                    var acc : Nat64 = 0;
                    var n = 0;
                    var iter = 0;
                    while (iter < iters) {
                        n := 0;
                        while (n < arrSize) {
                            acc +%= arr[n];
                            n += 1;
                        };
                        iter += 1;
                    };
                    sink := acc;
                };

                // Key test: Nat64 counter → Nat64.toNat() each iteration for index 
                case "nat64_toNat_index_vararray" {
                    var acc : Nat64 = 0;
                    var n64 : Nat64 = 0;
                    var iter = 0;
                    while (iter < iters) {
                        n64 := 0;
                        while (Nat64.toNat(n64) < arrSize) {
                            acc +%= arr[Nat64.toNat(n64)];
                            n64 += 1;
                        };
                        iter += 1;
                    };
                    sink := acc;
                };

                // Blob-backed: read single Nat8 values (simulate byte-level access)
                case "blob_backed_nat8_get" {
                    var acc : Nat64 = 0;
                    var n = 0;
                    var iter = 0;
                    while (iter < iters) {
                        n := 0;
                        while (n < blobSize) {
                            acc +%= n8to64(blobArr[n]);
                            n += 1;
                        };
                        iter += 1;
                    };
                    sink := acc;
                };

                // Blob-backed: reconstruct Nat64 from 8 bytes
                case "blob_backed_nat64_get" {
                    var acc : Nat64 = 0;
                    var n = 0;
                    var iter = 0;
                    while (iter < iters) {
                        n := 0;
                        while (n < arrSize) {
                            acc +%= blobGetNat64(n);
                            n += 1;
                        };
                        iter += 1;
                    };
                    sink := acc;
                };

                // Nat loop counter baseline (just the counter cost)
                case "nat_loop_counter" {
                    var acc : Nat = 0;
                    var iter = 0;
                    while (iter < iters) {
                        var n = 0;
                        while (n < 64) {
                            acc += n;
                            n += 1;
                        };
                        iter += 1;
                    };
                    sinkNat := acc;
                };

                // Nat64 loop counter (avoid toNat, compare as Nat64)
                case "nat64_loop_counter" {
                    var acc : Nat64 = 0;
                    var iter = 0;
                    while (iter < iters) {
                        var n : Nat64 = 0;
                        while (n < 64) {
                            acc +%= n;
                            n += 1;
                        };
                        iter += 1;
                    };
                    sink := acc;
                };

                // Nat64 counter but call toNat once per iteration (measure overhead)
                case "nat64_toNat_once_per_iter" {
                    var acc : Nat = 0;
                    var iter = 0;
                    while (iter < iters) {
                        var n : Nat64 = 0;
                        while (n < 64) {
                            acc += Nat64.toNat(n);
                            n += 1;
                        };
                        iter += 1;
                    };
                    sinkNat := acc;
                };

                case _ {};
            };
        });

        bench;
    };
};

quint · March 22, 2026, 2:15pm

I would like to see better subtyping for NatX/IntX.

nat8 <: nat16
etc.
nat64 <: nat

All Nat64.toNat() should just be handled at compile time imo, since it should be a subtype.

*the other way Nat to Nat64 can’t be automatic for abv. reasons.

I assume Nat indexing is an abstraction, probably not leb128 behind the scenes?

EDIT: I was wrong: Nat is not LEB128 behind the scenes for computation. It’s a bignum (arbitrary-precision integer using LibTomMath)

EDIT2: It cannot be a compile-time no-op with the current representation because:

Nat64 is an unboxed 64-bit word
Nat is a tagged/boxed bignum

They have different runtime representations, so coercion requires actual work (boxing the word into a bignum).

quint · March 22, 2026, 2:22pm

I did some digging…

Array indexing goes through bignum conversion

When you write arr[n], the compiler calls Arr.idx_bigint (compile_enhanced.ml:4453-4459), which:

Takes the Nat (bignum) index
Calls BigNum.to_word64_with to convert it to a raw 64-bit word (trapping on overflow)
Computes the element address

So every array access with a Nat index pays the cost of a bignum-to-word64 conversion, even if the value is small.

So it seem there is a real optimization opportunity here.

What @skilesare’s benchmarks actually demonstrate is that array indexing should accept Nat64 directly without going through bignum. Since idx_bigint already converts the bignum back to a word64 internally, a idx_word64 path that takes Nat64 directly would skip the round-trip (Nat64 → bignum → word64) entirely. That’s where the ~7% overhead in the benchmarks comes from.

quint · March 22, 2026, 3:30pm

Went ahead and implemented this. The change touches the type checker, both codegen backends (classical and enhanced), the IR checker, and both interpreters.

The approach: when arr[n] is used and n has type Nat64, the compiler emits a direct word64 index into the array, skipping the BigNum.to_word64_with conversion that Nat indices go through. Nat indexing is completely unchanged – purely additive.

Benchmarked on the IC via pocket-ic (256-element array, 1000 iterations):

Index type	Instructions	vs Nat
Nat (baseline)	25,687,306	–
Nat8	11,036,370	-57%
Nat16	11,079,329	-57%
Nat32	10,567,324	-59%
Nat64	17,991,325	-30%
Nat64 + toNat()	20,807,325	-19%

Nat8/16/32 are the cheapest because both the loop counter and the index avoid bignum overhead entirely. Nat64 is slightly more expensive due to the 64-bit comparison/increment cost on the EOP backend. All fixed-width types are significantly faster than Nat indexing.

Native Nat64 indexing is ~30% fewer instructions than Nat indexing and ~14% fewer than the toNat workaround. Lines up with your numbers showing Nat64 loop counters are ~3x cheaper – the conversion tax is real and adds up.

The implementation is straightforward. The type checker now accepts Nat64 in index position alongside Nat. At codegen time, if the index expression has type Nat64, it compiles as an UnboxedWord64 and calls Arr.idx directly instead of going through Arr.idx_bigint. Bounds checking still works normally since the array length fits in a word64 regardless.

Mutation (arr[n] := v where n : Nat64) works too. You can freely mix Nat and Nat64 indices on the same array.

Branch: GitHub - q-uint/motoko at quint/nat64-array-indexing · GitHub

github.com/caffeinelabs/motoko

feat: support fixed-width Nat types as array and blob indices (#5929)

master ← q-uint:quint/nat64-array-indexing

opened 04:11PM - 22 Mar 26 UTC

q-uint

+901 -17

## Summary - Allow Nat8, Nat16, Nat32, and Nat64 values as array indices (and… Nat64 for blobs), bypassing bignum conversion for direct word-sized index operations - All fixed-width Nat types compile to direct indexing, reducing IC instruction counts by 30-59% compared to Nat indices - Changes span the type checker, both codegen backends, IR checker, both interpreters, and include run/trap/fail/bench tests ## IC instruction counts (1000 iters x 256 elements) | Type | Instructions | vs Nat | |----------------|-------------|----------| | Nat (baseline) | 25,687,306 | | | Nat8 | 11,036,370 | -57% | | Nat16 | 11,079,329 | -57% | | Nat32 | 10,567,324 | -59% | | Nat64 | 17,991,325 | -30% | | Nat64+toNat() | 20,807,325 | -19% |

skilesare · March 22, 2026, 6:07pm

Holy crap. That is amazing!

In general I’ve found that the bignum has some overhead(understandable). I have some multiplication math that is much more efficient multiplying Arrays of [Nat64] (ie (Nat64, Nat64, Nat64, Nat64) for some 128bit math) using some limb based algos than doing Nat*Nat % Nat.

import Bench "mo:bench";
import Nat64 "mo:base/Nat64";
import Array "mo:base/Array";
import Int "mo:base/Int";
import CoreNat "mo:core/Nat";

module {

  // ═══════════════════════════════════════════════════════════
  //  BN128 field modulus (~254 bits)
  // ═══════════════════════════════════════════════════════════
  let P : Nat = 21888242871839275222246405745257275088696311157297823662689037894645226208583;

  // P as 4 Nat64 limbs, little-endian
  let P_L0 : Nat64 = 0x3c208c16d87cfd47;
  let P_L1 : Nat64 = 0x97816a916871ca8d;
  let P_L2 : Nat64 = 0xb85045b68181585d;
  let P_L3 : Nat64 = 0x30644e72e131a029;

  // Type: 4 × Nat64 limbs (little-endian: limb 0 = LSW)
  type L4 = (Nat64, Nat64, Nat64, Nat64);

  let P_L : L4 = (P_L0, P_L1, P_L2, P_L3);

  // ═══════════════════════════════════════════════════════════
  //  Strategy 1: NAT — current precompiles.mo approach
  // ═══════════════════════════════════════════════════════════

  func nat_modAdd(x: Nat, y: Nat) : Nat { (x + y) % P };
  func nat_modSub(x: Nat, y: Nat) : Nat {
    let xr = x % P;
    let yr = y % P;
    (xr + P - yr) % P;
  };
  func nat_modMul(x: Nat, y: Nat) : Nat { (x * y) % P };
  func nat_modInv(x: Nat) : Nat {
    var a = 0;
    var b = P;
    var u = 1;
    var t = x;
    while (t > 0) {
      let q = b / t;
      let t1 = t;
      let a1 = a;
      t := b % t;
      a := u;
      b := t1;
      u := a1 + (P - q) * u % P;
    };
    if (b == 1) a % P else 0;
  };

  // ═══════════════════════════════════════════════════════════
  //  Strategy 2: NAT — conditional subtract (no % for add/sub)
  // ═══════════════════════════════════════════════════════════

  func natcs_modAdd(x: Nat, y: Nat) : Nat {
    let s = x + y;
    if (s >= P) { s - P } else { s };
  };

  func natcs_modSub(x: Nat, y: Nat) : Nat {
    if (x >= y) { x - y } else { P + x - y };
  };

  func natcs_modMul(x: Nat, y: Nat) : Nat { (x * y) % P };

  // ═══════════════════════════════════════════════════════════
  //  Nat64 4-limb helpers
  // ═══════════════════════════════════════════════════════════

  func cmp4(a: L4, b: L4) : Int {
    if (a.3 != b.3) return if (a.3 > b.3) 1 else -1;
    if (a.2 != b.2) return if (a.2 > b.2) 1 else -1;
    if (a.1 != b.1) return if (a.1 > b.1) 1 else -1;
    if (a.0 != b.0) return if (a.0 > b.0) 1 else -1;
    0;
  };

  func ge4(a: L4, b: L4) : Bool { cmp4(a, b) >= 0 };

  func carry64(sum: Nat64, a: Nat64) : Nat64 {
    if (sum < a) (1 : Nat64) else (0 : Nat64);
  };

  func borrow64(a: Nat64, b: Nat64) : Nat64 {
    if (a < b) (1 : Nat64) else (0 : Nat64);
  };

  // Addition: a + b → (result, carry)
  func add4c(a: L4, b: L4) : (L4, Nat64) {
    let s0 = a.0 +% b.0;
    let c0 = carry64(s0, a.0);

    let s1a = a.1 +% b.1;
    let c1a = carry64(s1a, a.1);
    let s1 = s1a +% c0;
    let c1 = c1a +% carry64(s1, s1a);

    let s2a = a.2 +% b.2;
    let c2a = carry64(s2a, a.2);
    let s2 = s2a +% c1;
    let c2 = c2a +% carry64(s2, s2a);

    let s3a = a.3 +% b.3;
    let c3a = carry64(s3a, a.3);
    let s3 = s3a +% c2;
    let c3 = c3a +% carry64(s3, s3a);

    ((s0, s1, s2, s3), c3);
  };

  // Subtraction: a - b (wrapping)
  func sub4(a: L4, b: L4) : L4 {
    let d0 = a.0 -% b.0;
    let br0 = borrow64(a.0, b.0);

    let d1a = a.1 -% b.1;
    let br1a = borrow64(a.1, b.1);
    let d1 = d1a -% br0;
    let br1 = br1a +% borrow64(d1a, br0);

    let d2a = a.2 -% b.2;
    let br2a = borrow64(a.2, b.2);
    let d2 = d2a -% br1;
    let br2 = br2a +% borrow64(d2a, br1);

    let d3 = a.3 -% b.3 -% br2;
    (d0, d1, d2, d3);
  };

  // ═══════════════════════════════════════════════════════════
  //  Strategy 3: Nat64 4-limb modular add/sub
  // ═══════════════════════════════════════════════════════════

  func limb4_modAdd(a: L4, b: L4) : L4 {
    let (sum, c) = add4c(a, b);
    if (c > 0 or ge4(sum, P_L)) { sub4(sum, P_L) } else { sum };
  };

  func limb4_modSub(a: L4, b: L4) : L4 {
    if (ge4(a, b)) {
      sub4(a, b);
    } else {
      sub4(P_L, sub4(b, a));
    };
  };

  // ═══════════════════════════════════════════════════════════
  //  mulhi64: upper 64 bits of a × b
  // ═══════════════════════════════════════════════════════════

  func mulhi64(a: Nat64, b: Nat64) : Nat64 {
    let M : Nat64 = 0xFFFFFFFF;
    let a0 = a & M;
    let a1 = a >> 32;
    let b0 = b & M;
    let b1 = b >> 32;

    let cross1 = a1 * b0;
    let cross2 = a0 * b1;
    let lo_carry = (a0 * b0) >> 32;

    let mid = cross1 +% cross2;
    let mc1 = carry64(mid, cross1);

    let mid2 = mid +% lo_carry;
    let mc2 = carry64(mid2, mid);

    a1 * b1 +% (mid2 >> 32) +% ((mc1 +% mc2) << 32);
  };

  // ═══════════════════════════════════════════════════════════
  //  Schoolbook 4×4 → 8-limb multiplication
  // ═══════════════════════════════════════════════════════════

  func mul4x4(a: L4, b: L4) : [var Nat64] {
    let r = Array.init<Nat64>(8, (0 : Nat64));
    let al : [Nat64] = [a.0, a.1, a.2, a.3];
    let bl : [Nat64] = [b.0, b.1, b.2, b.3];

    var i = 0;
    while (i < 4) {
      var carry : Nat64 = 0;
      var j = 0;
      while (j < 4) {
        let pos = i + j;
        let lo = al[i] *% bl[j];
        let hi = mulhi64(al[i], bl[j]);

        let s1 = r[pos] +% lo;
        let c1 = carry64(s1, r[pos]);

        let s2 = s1 +% carry;
        let c2 = carry64(s2, s1);

        r[pos] := s2;
        carry := hi +% c1 +% c2;

        j += 1;
      };
      if (i + 4 < 8) {
        r[i + 4] := r[i + 4] +% carry;
      };
      i += 1;
    };
    r;
  };

  let POW64 : Nat = 0x10000000000000000; // 2^64

  func l8ToNat(r: [var Nat64]) : Nat {
    var n : Nat = 0;
    var idx : Int = 7;
    while (idx >= 0) {
      n := n * POW64 + Nat64.toNat(r[Int.abs(idx)]);
      idx -= 1;
    };
    n;
  };

  // ═══════════════════════════════════════════════════════════
  //  Strategy 4: Nat64 limb mul → Nat for final mod (hybrid)
  // ═══════════════════════════════════════════════════════════

  func limb4_modMul_hybrid(a: L4, b: L4) : L4 {
    let prod8 = mul4x4(a, b);
    let n = l8ToNat(prod8) % P;
    natToL4(n);
  };

  // ═══════════════════════════════════════════════════════════
  //  Conversions
  // ═══════════════════════════════════════════════════════════

  func natToL4(n: Nat) : L4 {
    (
      Nat64.fromNat(n % POW64),
      Nat64.fromNat((n / POW64) % POW64),
      Nat64.fromNat((n / POW64 / POW64) % POW64),
      Nat64.fromNat((n / POW64 / POW64 / POW64) % POW64),
    );
  };

  func _l4ToNat(l: L4) : Nat {
    var r = Nat64.toNat(l.3);
    r := r * POW64 + Nat64.toNat(l.2);
    r := r * POW64 + Nat64.toNat(l.1);
    r := r * POW64 + Nat64.toNat(l.0);
    r;
  };

  // ═══════════════════════════════════════════════════════════
  //  Scalar bit testing: Nat vs Nat64-limb
  // ═══════════════════════════════════════════════════════════

  func nat_scalarLoop(k: Nat) : Nat {
    var count : Nat = 0;
    var scalar = k;
    while (scalar > 0) {
      if (scalar % 2 == 1) { count += 1 };
      scalar := scalar / 2;
    };
    count;
  };

  func limb4_scalarLoop(k: L4) : Nat {
    var count : Nat = 0;
    var s0 = k.0;
    var s1 = k.1;
    var s2 = k.2;
    var s3 = k.3;
    while (s0 != 0 or s1 != 0 or s2 != 0 or s3 != 0) {
      if ((s0 & 1) == 1) { count += 1 };
      s0 := (s0 >> 1) | ((s1 & 1) << 63);
      s1 := (s1 >> 1) | ((s2 & 1) << 63);
      s2 := (s2 >> 1) | ((s3 & 1) << 63);
      s3 := s3 >> 1;
    };
    count;
  };

  // Scalar loop using Nat.bitshiftRight (Prim.shiftRight)
  func nat_bitshift_scalarLoop(k: Nat) : Nat {
    var count : Nat = 0;
    var scalar = k;
    while (scalar > 0) {
      if (scalar % 2 == 1) { count += 1 };
      scalar := CoreNat.bitshiftRight(scalar, 1);
    };
    count;
  };

  // natToL4 using bitshift instead of / POW64
  func natToL4_bitshift(n: Nat) : L4 {
    (
      Nat64.fromNat(n % POW64),
      Nat64.fromNat(CoreNat.bitshiftRight(n, 64) % POW64),
      Nat64.fromNat(CoreNat.bitshiftRight(n, 128) % POW64),
      Nat64.fromNat(CoreNat.bitshiftRight(n, 192) % POW64),
    );
  };

  // l4ToNat using bitshift instead of * POW64
  func _l4ToNat_bitshift(l: L4) : Nat {
    Nat64.toNat(l.0)
    + CoreNat.bitshiftLeft(Nat64.toNat(l.1), 64)
    + CoreNat.bitshiftLeft(Nat64.toNat(l.2), 128)
    + CoreNat.bitshiftLeft(Nat64.toNat(l.3), 192);
  };

  // ═══════════════════════════════════════════════════════════
  //  Combined pointAdd math simulation
  // ═══════════════════════════════════════════════════════════

  func nat_pointAddMath(x1: Nat, y1: Nat, x2: Nat, y2: Nat) : (Nat, Nat) {
    let m = nat_modMul(nat_modSub(y2, y1), nat_modInv(nat_modSub(x2, x1)));
    let x3 = nat_modSub(nat_modMul(m, m), nat_modAdd(x1, x2));
    let y3 = nat_modSub(nat_modMul(m, nat_modSub(x1, x3)), y1);
    (x3, y3);
  };

  func natcs_pointAddMath(x1: Nat, y1: Nat, x2: Nat, y2: Nat) : (Nat, Nat) {
    let m = natcs_modMul(natcs_modSub(y2, y1), nat_modInv(natcs_modSub(x2, x1)));
    let x3 = natcs_modSub(natcs_modMul(m, m), natcs_modAdd(x1, x2));
    let y3 = natcs_modSub(natcs_modMul(m, natcs_modSub(x1, x3)), y1);
    (x3, y3);
  };

  // ═══════════════════════════════════════════════════════════
  //  Bench harness
  // ═══════════════════════════════════════════════════════════

  func parseCount(col: Text) : Nat {
    switch(col) {
      case("10") 10;
      case("100") 100;
      case("1_000") 1_000;
      case(_) 1;
    };
  };

  public func init() : Bench.Bench {
    let bench = Bench.Bench();

    bench.name("BN128 Field Arithmetic Strategies");
    bench.description("Compares Nat bignum vs Nat64 4-limb for BN128 field ops");

    bench.cols(["10", "100", "1_000"]);

    bench.rows([
      "modAdd_nat",
      "modAdd_natcs",
      "modAdd_limb4",

      "modSub_nat",
      "modSub_natcs",
      "modSub_limb4",

      "modMul_nat",
      "modMul_natcs",
      "modMul_limb4_hybrid",

      "scalar_nat",
      "scalar_nat_bitshift",
      "scalar_limb4",

      "conv_natToL4_pow64",
      "conv_natToL4_bitshift",
      "conv_l4ToNat_pow64",
      "conv_l4ToNat_bitshift",

      "pointAdd_nat",
      "pointAdd_natcs",
    ]);

    let valA_mod : Nat = 0x1a0111ea397fe69a4b1ba7b6434bacd764774b84f38512bf6730d2a0f6b0f6241 % P;
    let valB_mod : Nat = 0x0e0a77c19a07df2f666ea36f7879462c0a78eb28f5c70b3dd35d438dc58f0d9d % P;

    let valA_L = natToL4(valA_mod);
    let valB_L = natToL4(valB_mod);

    let gx : Nat = 1;
    let gy : Nat = 2;
    let g2x : Nat = 1368015179489954701390400359078579693043519447331113978918064868415326638035;
    let g2y : Nat = 9918110051302171585080402603319702774565515993150576347155970296011118125764;

    let scalar : Nat = 0xDEADBEEFCAFEBABE1234567890ABCDEF0011223344556677AABBCCDDEEFF0099;
    let scalar_L = natToL4(scalar);

    bench.runner(func(row, col) {
      let n = parseCount(col);

      switch(row) {
        case("modAdd_nat") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := nat_modAdd(valA_mod, valB_mod); i += 1 };
        };
        case("modAdd_natcs") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := natcs_modAdd(valA_mod, valB_mod); i += 1 };
        };
        case("modAdd_limb4") {
          var i = 0; var sink = valA_L;
          while (i < n) { sink := limb4_modAdd(valA_L, valB_L); i += 1 };
        };

        case("modSub_nat") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := nat_modSub(valA_mod, valB_mod); i += 1 };
        };
        case("modSub_natcs") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := natcs_modSub(valA_mod, valB_mod); i += 1 };
        };
        case("modSub_limb4") {
          var i = 0; var sink = valA_L;
          while (i < n) { sink := limb4_modSub(valA_L, valB_L); i += 1 };
        };

        case("modMul_nat") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := nat_modMul(valA_mod, valB_mod); i += 1 };
        };
        case("modMul_natcs") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := natcs_modMul(valA_mod, valB_mod); i += 1 };
        };
        case("modMul_limb4_hybrid") {
          var i = 0; var sink = valA_L;
          while (i < n) { sink := limb4_modMul_hybrid(valA_L, valB_L); i += 1 };
        };

        case("scalar_nat") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := nat_scalarLoop(scalar); i += 1 };
        };
        case("scalar_nat_bitshift") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := nat_bitshift_scalarLoop(scalar); i += 1 };
        };
        case("scalar_limb4") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := limb4_scalarLoop(scalar_L); i += 1 };
        };

        case("conv_natToL4_pow64") {
          var i = 0; var sink = valA_L;
          while (i < n) { sink := natToL4(valA_mod); i += 1 };
        };
        case("conv_natToL4_bitshift") {
          var i = 0; var sink = valA_L;
          while (i < n) { sink := natToL4_bitshift(valA_mod); i += 1 };
        };
        case("conv_l4ToNat_pow64") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := _l4ToNat(valA_L); i += 1 };
        };
        case("conv_l4ToNat_bitshift") {
          var i = 0; var sink : Nat = 0;
          while (i < n) { sink := _l4ToNat_bitshift(valA_L); i += 1 };
        };

        case("pointAdd_nat") {
          var i = 0; var sink : (Nat, Nat) = (0, 0);
          while (i < n) { sink := nat_pointAddMath(gx, gy, g2x, g2y); i += 1 };
        };
        case("pointAdd_natcs") {
          var i = 0; var sink : (Nat, Nat) = (0, 0);
          while (i < n) { sink := natcs_pointAddMath(gx, gy, g2x, g2y); i += 1 };
        };

        case(_) {};
      };
    });

    bench;
  };
};

BN128 Field Arithmetic Strategies

Compares Nat bignum vs Nat64 4-limb for BN128 field ops

Instructions

	10	100	1_000
modAdd_nat	26_295 (0%)	255_529 (0%)	2_546_915 (0%)
modAdd_natcs	16_296 (0%)	152_020 (0%)	1_507_598 (0%)
modAdd_limb4	10_307 (0%)	90_141 (0%)	886_983 (0%)
modSub_nat	122_221 (0%)	1_209_605 (0%)	12_085_323 (0%)
modSub_natcs	16_382 (0%)	147_696 (0%)	1_459_174 (0%)
modSub_limb4	8_153 (0%)	63_417 (0%)	614_518 (0%)
modMul_nat	393_534 (0%)	3_917_578 (0%)	39_156_710 (0%)
modMul_natcs	393_925 (0%)	3_917_969 (0%)	39_156_747 (0%)
modMul_limb4_hybrid	2_143_417 (0%)	21_409_738 (0%)	214_111_977 (0%)
scalar_nat	60_639_152 (0%)	606_427_236 (0%)	6_064_621_828 (0%)
scalar_nat_bitshift	33_471_644 (0%)	334_694_088 (0%)	3_346_916_866 (0%)
scalar_limb4	208_894 (0%)	2_066_048 (0%)	20_635_926 (0%)
conv_natToL4_pow64	1_316_563 (0%)	13_143_377 (0%)	131_409_896 (0%)
conv_natToL4_bitshift	475_003 (0%)	4_723_367 (0%)	47_205_345 (0%)
conv_l4ToNat_pow64	114_257 (0%)	1_116_051 (0%)	11_132_329 (0%)
conv_l4ToNat_bitshift	96_867 (0%)	937_741 (0%)	9_344_819 (0%)
pointAdd_nat	4_952_756 (0%)	49_502_040 (0%)	494_993_218 (0%)
pointAdd_natcs	4_504_653 (0%)	45_017_077 (0%)	450_139_655 (0%)

Heap

	10	100	1_000
modAdd_nat	272 B (0%)	272 B (0%)	272 B (0%)
modAdd_natcs	272 B (0%)	272 B (0%)	272 B (0%)
modAdd_limb4	272 B (0%)	272 B (0%)	272 B (0%)
modSub_nat	272 B (0%)	272 B (0%)	272 B (0%)
modSub_natcs	272 B (0%)	272 B (0%)	272 B (0%)
modSub_limb4	272 B (0%)	272 B (0%)	272 B (0%)
modMul_nat	272 B (0%)	272 B (0%)	272 B (0%)
modMul_natcs	272 B (0%)	272 B (0%)	272 B (0%)
modMul_limb4_hybrid	272 B (0%)	272 B (0%)	272 B (0%)
scalar_nat	272 B (0%)	272 B (0%)	308 B (0%)
scalar_nat_bitshift	272 B (0%)	272 B (0%)	308 B (0%)
scalar_limb4	272 B (0%)	272 B (0%)	272 B (0%)
conv_natToL4_pow64	272 B (0%)	272 B (0%)	272 B (0%)
conv_natToL4_bitshift	272 B (0%)	272 B (0%)	272 B (0%)
conv_l4ToNat_pow64	272 B (0%)	272 B (0%)	272 B (0%)
conv_l4ToNat_bitshift	272 B (0%)	272 B (0%)	272 B (0%)
pointAdd_nat	272 B (0%)	272 B (0%)	272 B (0%)
pointAdd_natcs	272 B (0%)	272 B (0%)	272 B (0%)

Garbage Collection

	10	100	1_000
modAdd_nat	2.23 KiB (0%)	19.46 KiB (0%)	191.73 KiB (0%)
modAdd_natcs	1.3 KiB (0%)	10.09 KiB (0%)	97.98 KiB (0%)
modAdd_limb4	2.2 KiB (0%)	19.07 KiB (0%)	187.82 KiB (0%)
modSub_nat	8.21 KiB (0%)	79.23 KiB (0%)	789.38 KiB (0%)
modSub_natcs	1.26 KiB (0%)	9.7 KiB (0%)	94.07 KiB (0%)
modSub_limb4	1.26 KiB (0%)	9.7 KiB (0%)	94.07 KiB (0%)
modMul_nat	9.34 KiB (0%)	90.55 KiB (0%)	902.66 KiB (0%)
modMul_natcs	9.34 KiB (0%)	90.55 KiB (0%)	902.66 KiB (0%)
modMul_limb4_hybrid	85.4 KiB (0%)	851.1 KiB (0%)	8.31 MiB (0%)
scalar_nat	1.77 MiB (0%)	17.65 MiB (0%)	176.51 MiB (0%)
scalar_nat_bitshift	1.07 MiB (0%)	10.65 MiB (0%)	106.53 MiB (0%)
scalar_limb4	328 B (0%)	328 B (0%)	328 B (0%)
conv_natToL4_pow64	40.95 KiB (0%)	406.57 KiB (0%)	3.97 MiB (0%)
conv_natToL4_bitshift	17.08 KiB (0%)	167.9 KiB (0%)	1.64 MiB (0%)
conv_l4ToNat_pow64	7.04 KiB (0%)	67.51 KiB (0%)	672.2 KiB (0%)
conv_l4ToNat_bitshift	6.92 KiB (0%)	66.34 KiB (0%)	660.48 KiB (0%)
pointAdd_nat	159.23 KiB (0%)	1.55 MiB (0%)	15.52 MiB (0%)
pointAdd_natcs	129.97 KiB (0%)	1.27 MiB (0%)	12.66 MiB (0%)

Stopping replica…

I’m all for not optimizing before it is time, but perhaps 5 years in, it is time!

skilesare · March 23, 2026, 11:43pm

@quint, Say I want to give this a spin: do I just point mops toolchain to your repo with a tag? Or do I need to compile the compiler?

quint · March 24, 2026, 9:51am

You have to build the compiler on that branch:

nix develop --command make -C src moc

Producing the binary at src/_build/default/exes/moc.exe (with a symlink at src/moc ).

skilesare · March 25, 2026, 4:56am

@ggreif @claudio @kritzcreek Any chance we could actually get this into the main branch?

claudio · March 27, 2026, 10:34am

Interesting idea. Not sure we are keen on overloading indexing in this way but there might be other ways to get the perf back (perhaps even the convenient syntax back).

I added some comments to the PR feat: support fixed-width Nat types as array and blob indices by q-uint · Pull Request #5929 · caffeinelabs/motoko · GitHub.

Topic		Replies	Views
Seamless switching to 64 bit memory Motoko	22	783	June 2, 2023
How to convert [Nat8] to [Nat64] Language Support Motoko	3	485	March 16, 2023
Motoko, Array, Memory Programs & Applications	39	4816	December 29, 2022
ArrayBuffer equivalent in motoko? Developers	10	1344	May 25, 2021
New Motoko array sorting library Motoko	19	1123	December 6, 2025

Should motoko arrays be indexed by Nat64? Or could we get Nat64 indexed Arrays?

Benchmark summary

Why native Nat64 indexing would help

What I think would be useful

Array indexing goes through bignum conversion

Related topics

Why native `Nat64` indexing would help