Heap out of bounds , use motoko

When preupgrade is running in a fragmented heap, there may not be enough capacity to push all the information into stable variables and externalize them into a monolithic buffer (which then will be copied to the stable memory space). We are aware of these two problems:

  • need to run an explicit GC before preupgrade (and possibly after)
  • need to use a streaming technique to write stable variables to stable memory.

I am working on the former and @claudio is about to tackle the latter.

1 Like

Thanks for the response.

I knew there was an intermediate buffer, but I didn’t know it would take that much space… From <800 MB to “heap out of bounds” (i.e. 3.2 GB used) seems quite extreme.

But yeah, those two workarounds sound good. Do you happen to have an ETA on either of those solutions? (No rush but would be helpful to plan.) Also, would you recommend users migrate to using the ExperimentalStableMemory Motoko library, which I assume avoids these problems?

That doesn’t look like Motoko allocation failure to me, which produces a different error message (IIRC) and more like trying to address wasm memory that hasn’t been pre-allocated by a memory grow.

Is there any chance you could share the offending code or a smaller repro?

If you have a repro, is there any chance you could share the code? I’d like to check we don’t have a bug.

More likely than not, you have changed your data structure used in stable variables which violated the upgrade rules. Please see a previous discussion Dfx deploy --network=ic heap out of bounds - #11 by PaulLiu

1 Like

This issue still seems to be unresolved. When I upgrade the same wasm (motoko), the canister with little memory usage can be upgraded successfully, but the canister with 70M memory usage cannot be upgraded, and the error “Canister qhssc-vaaaa-aaaak-adh2a-cai trapped: heap out of bounds” is reported. This problem has been encountered several times.

Memory allocation: 0
Compute allocation: 0
Freezing threshold: 2_592_000
Memory Size: Nat(70311386)
Balance: 14_558_822_050_955 Cycles
Module hash: 0xa2edf6ff68c0b00dc9e971602d7542f542289a62c6d108d60dda64273406e8b0

1 Like

Thanks for the report! Though 70 MB doesn’t seem to be so high that the heap would be in danger due to serialisation of stable variables. Would you mind giving us the moc version used and a few hints about how the heap structure differs between the “little memory” and the “70M memory”? Ideally, if this is a test canister, a reproducible example of canister code + message sequence leading to the problem would allow us to come up with the fastest bug fix.

1 Like

Is there any chance your data structure has a lot of internal sharing of immutable values, i.e. is a graph, not tree, of data? The serialisation step can turn a graph into a tree by duplicating shared values, which might exhaust memory on deserialisation.

1 Like

After repeated tests, we found that.

  • If all the stable vars are left unchanged, the upgrade can be done successfully.
  • This issue occurs if a stable var (trie structure) with thousands of data is deleted. It seems to trigger a reorganization of the whole stable memory, because we found that canister memory increased to more than 100M during the upgrade.
1 Like

Would you be able to provide a repro for the problem? It would be interesting to figure out what is going wrong.

1 Like

I wrote an ICRC1 contract f2r76-wqaaa-aaaak-adpzq-cai and now have 27M of memory with 18590 records in Trie.
I didn’t change any stable vars and this issue occurred when I upgraded. It seems to be related to the number of records.

Upgraded code: (one element of the array can not put too much data?)

    // upgrade
    private stable var __drc202Data: [DRC202.DataTemp] = [];
    system func preupgrade() {
        __drc202Data := [drc202.getData()];
    };
    system func postupgrade() {
        if (__drc202Data.size() > 0){
            drc202.setData(__drc202Data[0]);
            __drc202Data := [];
        };
    };
1 Like

Looking at DRC_standards/DRC202 at main · iclighthouse/DRC_standards · GitHub, which I’m guessing what you are using, I think the problem might be the 64 byte (?) transaction ids (Blobs):

DRC_standards/DRC202Types.mo at cb25d11a61526531c6fd897cbb35682b1c6a4806 · iclighthouse/DRC_standards · GitHub which are uses as key for various maps in DataTemp.

In memory, a Blob is representated by a 4-byte pointer to a byte vector, and references to the same blob share the vector using an indirection.

Your tries use those blobs as keys in a number of places (the fields of TempData)

When multiple reference to the same blob are serialized during upgrade, each reference to the
blob is serialized to its own vector of byes in the stream, losing all sharing that was present in the in memory representation.

I think that may be the problem here. It’s a known issue that the current stable representation of data can blow up do to loss of sharing. I think, unfortunately, you may have been bitten by this.

If that is the problem, the only workaround I can see would be to introduce another map that maps TxId to a small integer, storing the large blobs just once as key, that you then use as small key for the other tries.

If it’s any help, you can use this Internet Computer Content Validation Bootstrap to calculate the size of memory needed for stable variables.

The key is a 32-byte blob, using a trie structure, and inside the trie the key is 4 bytes (generated using Blob.hash). .
Does this cause any problem? 32-byte keys are common in blockchain applications.
Our key is not integer incremental, it is generated according to a rule that requires 32 bytes to avoid collisions. You know, BTC, ETH are such rules.
If this is the reason, what can be done to improve it?

Should I use the ExperimentalStableMemory tool? But there is no didc toolkit in motoko to convert structural data to blob.

Right, but I think the leaves of the trie will actually still store the full 32-byte (not bit) key, to resolve collisions. So each trie will contain its own reference to the key blob, which is then unshared on upgrade. But maybe I’m barking up the wrong tree.

The AccountId’s are also Blobs - how large are those?

You could try to use ExperimentalStableMemory to store the data.

Motoko provides to_candid and from_candid overations to map values of shared types to/from blobs:

But I think it would be wiser to figure out where the blow-up, if any, is coming from before rewriting everything to use stable memory.

I think getting a small repro of the behaviour would be best to figure out how to fix it.

All blob keys are 32 bytes

This is an issue that is bound to happen. Anyone can deploy a token canister and generate 20,000 transactions and then try to upgrade.

I don’t disagree that what Motoko provides is insufficient and we actually want to improve it in future. But that won’t happen soon.

Given the current state, I can make these observations:

There are a few more stable Tries indexed on Blobs, and each transaction, of which there could be many, stores a sender and receiver Blob, many of which I’d imagine to be shared between transactions (same sender or receiver).

Can you not maintain a two way map from Blob to BlobId internally, and then just use the
small BlobId to reference that particular Blob from other data structures (like a data base table and key). That, would, I hope, mitigate the explosion on upgrade.

If easy to arrange, maintaining the large data in stable memory directly would also avoid the issue, but that’s probably a lot more work.

1 Like

It may work, but it doesn’t solve the problem at all. I don’t think a 32 byte blob is a big overhead, compared to the 32 bytes per int on EVM. This feels like maintaining a database. But there is no database engine in canister, and it is impossible for the subnet to allocate enough computational resources to each canister.
So I would like to make a suggestion.
Be upfront with reality: Canister is first and foremost a blockchain category, it is not a server or a database. It is not the same as EVM, but it is blockchain based, so draw on Solidty’s years of experience and solutions. More support for k-v map structure and restrict full table traversal comparison. There are enough Dapps on eth to prove that most of the business logic that relies on full table matching comparison can be achieved by k-v reading and writing through technical modifications. IC has higher scalability than ETH, so there is an opportunity to support it better.

Mechanisms can be designed to guide developers into a new paradigm. For example, it is possible to allow a controller to deploy a group of canisters on a subnet, and calls between this group of canisters are treated as internal calls, supporting atomicity. Then the developer can design the proxy-impl-data (MVC) pattern to separate the entry, business, and data in different canisters, and in fact the developer just needs to upgrade the impl canister frequently.

6 Likes

After some experimentation with upgrades, I manage to provoke the same error. It is a bug in the deserialization code, which requires more Rust/C stack than it has available as well as a better stack guard.

A fix is under review that should mitigate the failures, if not rule them out.

1 Like