Seamless switching to 64 bit memory

As far as I know, there is development in switching to 64 bit heap memory. Correct me, if it is not true.

I have like

stable var tree: BTree.BTree<Nat32, SubDB>;

Is it wise to use Nat64 instead in order to streamline switch to 64 bit? Will using Nat64 help in this in any way? If yes, how exactly may it help?

This is unrelated. 64 bit WASM is about how much non-stable memory you can use in your canister. What you are asking about is using a 4 or 8 byte Nat. The only difference between the two is how large the numbers they can represent can go up to.

Also, you really want to use plain Nat. There rarely is a reason to use anything else.

3 Likes

You think so? Can you explain? I usually try to use the smallest nat/int for the job (in Azle/Kybra/Rust), do you think your reasoning applies to the Rust Nat as well?

1 Like

I know in motoko Nat can be smaller in memory than Nat64. Of course it can be bigger as well.

So, do I understand correctly that as 64 bit WASM is about increasing heap memory, I may specify Nat64 to allow more entries, in the hope that I could put more entries into the tree in the future?

Yes, that works. Alternatively you can use a data structure that uses stable memory to get a larger max limit already now

Why not make it Nat?

Nat is a variable size type. Thus it will require memory allocation that diminishes both memory and time efficiency because it will need to be allocated.

I don’t think that’s true. It depends on the values in them. For a value of the same size Nat and Nat32/Nat64 behave essentially the same. I have benchmarked and cycle-optimized a few libraries and couldn’t measure a noticeable difference. If I remember correctly then the comparison operation can even be faster for Nat. Modular addition is faster than normal addition, that can be an advantage of NatX.

@qwertytrewq, that’s a common misconception. In fact, it’s almost the opposite. The memory word size in Wasm is 32 bit (and we need 1 bit in Motoko to distinguish pointers for the GC). So any value that requires more than 32/31 bits has to be boxed. Thus, Nat32 and Nat64 are boxed, i.e., require allocations in most circumstances, because they lack a variable-size representation. Small Nat values (< 2^31) o.t.o.h. are unboxed.

The only use cases for fixed-size NatX/IntX types in Motoko are for interaction with external data formats that use them, and for the implementation of algorithms that inherently use 32/64-bit integer arithmetics, e.g., many crypto algorithms.

1 Like

I don’t understand:

  1. Nat32 is 32 bit, why does it need to be boxed?

  2. Nat64 is 64 bit, why it is not two (unboxed) 32-bit words?

Because Motoko is a GC-ed, polymorphic language. And as such it needs to use a uniform representation, where every type is representable in a single word, minus 1 bit to distinguish pointers. Representations can be flattened in some circumstances (e.g., in locals), but not in general (e.g., as individual values in memory).

1 Like

What about working with [Nat8] and Blob. Is there a more efficient way to use Nats? Should we have conversions from [Nat] to blob? Or since these are less than 32 bits are they efficient enough?

Yes, Nat8/16 aren’t boxed.

It would be nice to have specialised representations for arrays with small element types. The trade-off is that it would make accesses to arrays of generic element type significantly more expensive, because they’d need to inspect and dispatch on the type. Hence Motoko currently does not do that, though it has been discussed repeatedly in the team, and may happen eventually.

But right now that means that a [Nat8] takes up 4x the space of a Blob in memory (though not on wire). That’s the reason why Blob was introduced in the first place.

FWIW, specialising arrays with larger element type, such as unboxing the elements in [Nat32] or [Nat64], would be even more costly, because then reads would involve a boxing allocation in many situations.

1 Like

I think I have to correct myself here. I should have said “For a value of the same small size Nat and Nat32/Nat64 behave essentially the same.” As soon as the values cross into 31 bits, i.e. when boxing happens, the differences start to show. I measured storing Nat/Nat32/Nat64 in an array. When the values are <= 30 bits it is 4 bytes per entry for all of the three types. When the value is 31 bits then it is 12 bytes per entry for Nat32, 16 bytes for Nat64 and 152 bytes for Nat.

So I think the answer to OP’s question could be yes. If he uses Nat64 now then it is as memory efficient as it can be while there are < 2^31 keys (which will certainly be the case with 32 bit memory). When a switch to 64 bit memory happens and there are >= 2^31 then Nat64 is more memory-efficient than Nat.

It is unclear from OP’s if the keys are purely internal or if they are user supplied. It should be noted that in candid Nat32 serializes to 4 bytes and Nat64 to 8 bytes fixed length regardless of the value inside it. While Nat serializes to variable length, so can be more efficient on the wire. It could be important if it’s a high volume application.

I don’t believe that’s correct. When the values are ≤ 30 bits, it’s indeed 4 bytes for Nat, but should still be 12 resp 16 bytes for Nat32/64, because they are always boxed in memory.

No, Nat32/Nat64 aren’t always boxed to my knowledge and according to my measurements. I measured an array. Does that make a difference for what we are saying? An array of n elements of type Nat32 takes 4*n + c bytes if all values are <= 30 bits.

1 Like

I also see the same behavior in my measurements. Nat/Nat32 take the same amount of space when they are <= 30bit. Above that Nat will start going crazy (Bigint path). It will hugely increase cycle consumption for arithmetic operation (especially for division and remainder which become crazily expensive - like 10x lower overall performance) and start producing garbage for arithmetic operations. Even if they are <= 30bits arithmetic operations are a lot faster for Nat32 (especially wrapping ones) because Nat will need to perform overflow checks on every arithmetic operation.