Heap out of bounds , use motoko

canister id : 5su7s-vqaaa-aaaai-abgua-cai
When upgrade, get err:
The invocation to the wallet call forward method failed with the error: An error happened during the call: 5: Canister … trapped: heap out of bounds

method:

heap_size        = Prim.rts_heap_size();
        memory_size      = Prim.rts_memory_size();
        max_live_size    = Prim.rts_max_live_size();
        total_allocation = Prim.rts_total_allocation();
        reclaimed        = Prim.rts_reclaimed();
        rts_version      = Prim.rts_version();
        cycles           = Cycles.balance();

Why heap out of bounds ,How can I solve this problem??This problem is that I feel extremely painful and sad. I also want to know: does the canisters written by rust have this problem?

1 Like

Is the canister’s memory management similar to linux’s virtual memory?
public func push(x : T, l : List) : List = ?(x, l);
Adding items to the linked list, will it be similar to the malloc implementation of c language, allocating memory from virtual space?

Where can I find more information on Prim?

I’m getting this same error when I try to upgrade my canister locally.

Installing canisters...
Upgrading code for canister service, with canister_id rrkah-fqaaa-aaaaa-aaaaq-cai
The invocation to the wallet call forward method failed with the error: An error happened during the call: 5: Canister rrkah-fqaaa-aaaaa-aaaaq-cai trapped: heap out of bounds

Here is the canister info from the replica UI:

Scheduler state
last_full_execution_round	2220754
compute_allocation	0%
freeze_threshold (seconds)	2592000
memory_usage	768365333
accumulated_priority	-1600
Cycles balance	4000000000000

I’m surprised I’m running out of heap space, when I’m only using 768 MB of memory pre-upgrade. I knew Motoko stable variable serialization took up space, but I didn’t think it would take up 3+ GB.

@claudio @rossberg Do you know what may be going wrong here? To confirm, I didn’t modify any stable variable types (or make any changes) between the last canister upgrade and the current one.

EDIT: When I reduce the memory_usage of my canister (by reinstalling and populating it with fresh data) to 753 MB and run dfx deploy, it works… Is the pre-upgrade memory_usage cutoff really somewhere between 753 MB and 768 MB above which canister upgrades don’t work??

@ggreif or @ulan might have an idea what’s going on and what the message means.

When preupgrade is running in a fragmented heap, there may not be enough capacity to push all the information into stable variables and externalize them into a monolithic buffer (which then will be copied to the stable memory space). We are aware of these two problems:

  • need to run an explicit GC before preupgrade (and possibly after)
  • need to use a streaming technique to write stable variables to stable memory.

I am working on the former and @claudio is about to tackle the latter.

2 Likes

Thanks for the response.

I knew there was an intermediate buffer, but I didn’t know it would take that much space… From <800 MB to “heap out of bounds” (i.e. 3.2 GB used) seems quite extreme.

But yeah, those two workarounds sound good. Do you happen to have an ETA on either of those solutions? (No rush but would be helpful to plan.) Also, would you recommend users migrate to using the ExperimentalStableMemory Motoko library, which I assume avoids these problems?

That doesn’t look like Motoko allocation failure to me, which produces a different error message (IIRC) and more like trying to address wasm memory that hasn’t been pre-allocated by a memory grow.

Is there any chance you could share the offending code or a smaller repro?

If you have a repro, is there any chance you could share the code? I’d like to check we don’t have a bug.

More likely than not, you have changed your data structure used in stable variables which violated the upgrade rules. Please see a previous discussion Dfx deploy --network=ic heap out of bounds - #11 by PaulLiu

1 Like

This issue still seems to be unresolved. When I upgrade the same wasm (motoko), the canister with little memory usage can be upgraded successfully, but the canister with 70M memory usage cannot be upgraded, and the error “Canister qhssc-vaaaa-aaaak-adh2a-cai trapped: heap out of bounds” is reported. This problem has been encountered several times.

Memory allocation: 0
Compute allocation: 0
Freezing threshold: 2_592_000
Memory Size: Nat(70311386)
Balance: 14_558_822_050_955 Cycles
Module hash: 0xa2edf6ff68c0b00dc9e971602d7542f542289a62c6d108d60dda64273406e8b0

1 Like

Thanks for the report! Though 70 MB doesn’t seem to be so high that the heap would be in danger due to serialisation of stable variables. Would you mind giving us the moc version used and a few hints about how the heap structure differs between the “little memory” and the “70M memory”? Ideally, if this is a test canister, a reproducible example of canister code + message sequence leading to the problem would allow us to come up with the fastest bug fix.

1 Like

Is there any chance your data structure has a lot of internal sharing of immutable values, i.e. is a graph, not tree, of data? The serialisation step can turn a graph into a tree by duplicating shared values, which might exhaust memory on deserialisation.

1 Like

After repeated tests, we found that.

  • If all the stable vars are left unchanged, the upgrade can be done successfully.
  • This issue occurs if a stable var (trie structure) with thousands of data is deleted. It seems to trigger a reorganization of the whole stable memory, because we found that canister memory increased to more than 100M during the upgrade.
1 Like

Would you be able to provide a repro for the problem? It would be interesting to figure out what is going wrong.

1 Like

I wrote an ICRC1 contract f2r76-wqaaa-aaaak-adpzq-cai and now have 27M of memory with 18590 records in Trie.
I didn’t change any stable vars and this issue occurred when I upgraded. It seems to be related to the number of records.

Upgraded code: (one element of the array can not put too much data?)

    // upgrade
    private stable var __drc202Data: [DRC202.DataTemp] = [];
    system func preupgrade() {
        __drc202Data := [drc202.getData()];
    };
    system func postupgrade() {
        if (__drc202Data.size() > 0){
            drc202.setData(__drc202Data[0]);
            __drc202Data := [];
        };
    };
1 Like

Looking at DRC_standards/DRC202 at main · iclighthouse/DRC_standards · GitHub, which I’m guessing what you are using, I think the problem might be the 64 byte (?) transaction ids (Blobs):

DRC_standards/DRC202Types.mo at cb25d11a61526531c6fd897cbb35682b1c6a4806 · iclighthouse/DRC_standards · GitHub which are uses as key for various maps in DataTemp.

In memory, a Blob is representated by a 4-byte pointer to a byte vector, and references to the same blob share the vector using an indirection.

Your tries use those blobs as keys in a number of places (the fields of TempData)

When multiple reference to the same blob are serialized during upgrade, each reference to the
blob is serialized to its own vector of byes in the stream, losing all sharing that was present in the in memory representation.

I think that may be the problem here. It’s a known issue that the current stable representation of data can blow up do to loss of sharing. I think, unfortunately, you may have been bitten by this.

If that is the problem, the only workaround I can see would be to introduce another map that maps TxId to a small integer, storing the large blobs just once as key, that you then use as small key for the other tries.

If it’s any help, you can use this Internet Computer Content Validation Bootstrap to calculate the size of memory needed for stable variables.

1 Like

The key is a 32-byte blob, using a trie structure, and inside the trie the key is 4 bytes (generated using Blob.hash). .
Does this cause any problem? 32-byte keys are common in blockchain applications.
Our key is not integer incremental, it is generated according to a rule that requires 32 bytes to avoid collisions. You know, BTC, ETH are such rules.
If this is the reason, what can be done to improve it?

Should I use the ExperimentalStableMemory tool? But there is no didc toolkit in motoko to convert structural data to blob.

Right, but I think the leaves of the trie will actually still store the full 32-byte (not bit) key, to resolve collisions. So each trie will contain its own reference to the key blob, which is then unshared on upgrade. But maybe I’m barking up the wrong tree.

The AccountId’s are also Blobs - how large are those?

You could try to use ExperimentalStableMemory to store the data.

Motoko provides to_candid and from_candid overations to map values of shared types to/from blobs:

But I think it would be wiser to figure out where the blow-up, if any, is coming from before rewriting everything to use stable memory.

I think getting a small repro of the behaviour would be best to figure out how to fix it.