Heap out of bounds ， use motoko

bitbruce · December 1, 2022, 12:12pm

All blob keys are 32 bytes

bitbruce · December 1, 2022, 12:16pm

This is an issue that is bound to happen. Anyone can deploy a token canister and generate 20,000 transactions and then try to upgrade.

github.com

iclighthouse/DRC_standards/blob/main/DRC20/examples/ICLighthouse/ICRC1.mo

/**
 * Module     : ICRC1.mo
 * Author     : ICLighthouse Team
 * License    : Apache License 2.0
 * Stability  : Experimental
 * Github     : https://github.com/iclighthouse/DRC_standards/
 */

import Prim "mo:⛔";
import AID "mo:icl/AID";
import Array "mo:base/Array";
import Binary "mo:icl/Binary";
import Blob "mo:base/Blob";
import Cycles "mo:base/ExperimentalCycles";
import DRC202 "mo:icl/DRC202";
import Deque "mo:base/Deque";
import Hex "mo:icl/Hex";
import Int "mo:base/Int";
import Int32 "mo:base/Int32";
import Iter "mo:base/Iter";

This file has been truncated. show original

claudio · December 1, 2022, 12:59pm

I don’t disagree that what Motoko provides is insufficient and we actually want to improve it in future. But that won’t happen soon.

Given the current state, I can make these observations:

There are a few more stable Tries indexed on Blobs, and each transaction, of which there could be many, stores a sender and receiver Blob, many of which I’d imagine to be shared between transactions (same sender or receiver).

Can you not maintain a two way map from Blob to BlobId internally, and then just use the
small BlobId to reference that particular Blob from other data structures (like a data base table and key). That, would, I hope, mitigate the explosion on upgrade.

If easy to arrange, maintaining the large data in stable memory directly would also avoid the issue, but that’s probably a lot more work.

bitbruce · December 1, 2022, 3:45pm

It may work, but it doesn’t solve the problem at all. I don’t think a 32 byte blob is a big overhead, compared to the 32 bytes per int on EVM. This feels like maintaining a database. But there is no database engine in canister, and it is impossible for the subnet to allocate enough computational resources to each canister.
So I would like to make a suggestion.
Be upfront with reality: Canister is first and foremost a blockchain category, it is not a server or a database. It is not the same as EVM, but it is blockchain based, so draw on Solidty’s years of experience and solutions. More support for k-v map structure and restrict full table traversal comparison. There are enough Dapps on eth to prove that most of the business logic that relies on full table matching comparison can be achieved by k-v reading and writing through technical modifications. IC has higher scalability than ETH, so there is an opportunity to support it better.

Mechanisms can be designed to guide developers into a new paradigm. For example, it is possible to allow a controller to deploy a group of canisters on a subnet, and calls between this group of canisters are treated as internal calls, supporting atomicity. Then the developer can design the proxy-impl-data (MVC) pattern to separate the entry, business, and data in different canisters, and in fact the developer just needs to upgrade the impl canister frequently.

claudio · February 8, 2023, 11:00am

After some experimentation with upgrades, I manage to provoke the same error. It is a bug in the deserialization code, which requires more Rust/C stack than it has available as well as a better stack guard.

A fix is under review that should mitigate the failures, if not rule them out.

github.com/dfinity/motoko

bugfix: fix heap out bounds error during destabilization

dfinity:master ← dfinity:claudio/destabilization-crash-fix

opened 10:38AM - 08 Feb 23 UTC

crusso

+72 -2

Builds on https://github.com/dfinity/motoko/pull/3781, which adds a repro for a …compiler crash on destabilization of deeply nested data. This PR fixes that bug which is the likely culprit behind https://forum.dfinity.org/t/heap-out-of-bounds-use-motoko/10497. TL/DR our deserialization code use the C stack for temporary allocations, but omits a bounds check when growing the stack. The C stack lives in the first few pages of Wasm memory and grows downwards, previously relying on out of (high) memory read or writes to (unreliably)* detect stack overflow (i.e. stack ptr underflow). When the stack overflows due to deep recursion, the code tries to write to the last page of wasm memory, causing a |heap out of bounds error" in the replica - an attempt to read from a unallocated wasm page. This PR: [ ] implements the missing bound check. [ ] allocates more pages to C stack, (was 2 pages (128KiB), now 32 pages (2MiB) Note, this is really a safer mitigation, not a fix per se. Some stable variable data will still cause the stack to overflow, but the failure will be safe, not potentially corrupt memory*. Reserving more stack space should make the failures rarer, if not impossible. *memory corruption can happen if the last wasm page is already allocated and in use before the unchecked underflow, causing heap corruption. Rare, but possible. Related issue: https://github.com/dfinity/motoko/issues/2883 due to @osa1 That issue also suggests a more performant stack guard, at the cost of never allocating the last page. Not sure its worth it though.

claudio · February 17, 2023, 8:55pm

Although we just released 0.8.2 which mentions an improvement in this area, the second half of the fix is still under review but will hopefully be out in 0.8.3. Just to set expectations correctly.

The PR to watch is this one:

github.com/dfinity/motoko

perf: compress deserialization wasm stack by exploiting wasm stack for all deserialization args

dfinity:claudio/stack-compress ← dfinity:claudio/stack-compress-draft

opened 12:34AM - 17 Feb 23 UTC

crusso

+283 -179

* introduces a traditional frame pointer to Rust stack for storing arguments at …known offset from frame pointer * uses it to offload all args from wasm to Rust stack, compressing the wasm stack * reaches parity in serialization/deserialization of deeply nested data (achievement unlocked). Not as bad as red black trees. - [*] reference previous values of locals via saved frame pointer, avoid spills to wasm stack (also just cooler). - [x] make expensive stack measurement code conditional on compiler flag. Builds on #3798 and #3794.

claudio · February 24, 2023, 5:17pm

Ok, Motoko 0.8.3 is out which I am hoping will fix these particular heap out of bounds issues, at least to the point where deserialization should no longer stack overflow when serialization does not stack overflow.

Note that storing large amounts of data in stable variables (I’m thinking 1.5GB or more) may well fail to deserialize due to memory exhaustion, but that’s a different issue which we hope to tackle shortly.

infu · June 19, 2023, 9:33pm

@claudio @ZhenyaUsenko @skilesare I’ve got a similar problem.
At first, I thought it was my Mac, but then I deployed in the motoko playground and I am getting the same results.

https://m7sm4-2iaaa-aaaab-qabra-cai.ic0.app/?tag=2112554174

repo:
https://github.com/infu/bug_x_weird

You will probably need “Internet Base” VS Code extension so you can run the ./check.blast (from repo) to populate it with 100k records.
These records look like this:

Once they get inserted, the canister won’t upgrade. Sometimes it throws “stack overflow” and on rare occasions “heap out of bounds”. The memory is ~52mb

I’ve tried various dfx versions and also hashmap’s previous version.

I’ve tried using nhash instead of thash (Nat keys). I’ve tried removing skills:[Text].
Tried removing the id from the document as well so it doesn’t repeat.
I’ve also tried inserting the records in smaller portions, 100 per request instead of 1000.
Nothing seems to help with the error. Everything seems to work fine until you try to upgrade it.

claudio · June 20, 2023, 9:49am

Taking a look.

Have you tried using ExperimentalStableMemory.stableVarQuery()

to determine the serialized size of the stable variables without actually doing the serialization.
Serialization can sometimes non-linearly expand the size of the source data if it contained a lot of internal sharing.

My other suspicion is that there is some large linked list (or unbalance tree) in the data that is causing serialization or deserialization to run out of stack.

it would also be good to figure out if this happens in pre-upgrade or post-upgrade using dfx (not playground) and some Debug.prints in a postupgrade system method.

infu · June 20, 2023, 10:03am

Thanks for the tip, I’ll add it to the stats function and see.

I’ve just tried @timo 's Vector with the same record type and managed to get 1.4mil (haven’t tested with more), upgrades work.
canister stats:
“documents”: “1400000”,
“memory_size”: “608043008”,
“max_live_size”: “246481600”,
“stable_size”: “112850245”,
“heap_size”: “246483248”,
“total_allocation”: “359333728”,
“reclaimed”: “112850480”

infu · June 20, 2023, 10:08am

With the Map, I get “stack overflow” when running the same stableVarQuery that works in the Vector test.
stable_size = (await Prim.stableVarQuery()()).size;

claudio · June 20, 2023, 10:15am

So I think it might be a problem with the motoko_hash_map data structure.

Indeed someone seems to have reported a similar issue "trapped: stack overflow" on canister upgrade on dfx 0.11.2 · Issue #1 · ZhenyaUsenko/motoko-hash-map · GitHub with version 8 of that.

I looked at the code for master and it seems to both rely on internal sharing and a recursive type that could degenerate to a linked list, causing the stack overflow when deep.

Version 7 uses a simpler data structure, I think, so might survive serialization better.

I noticed your dfx.json is using a rather old version of Motoko, it would be worth upgrading Motoko or dfx since we did some work on improving the capacity of deserialization since (see above Heap out of bounds ， use motoko - #27 by claudio).

infu · June 20, 2023, 10:26am

I’ve tried with multiple versions, the repo is using the older one, because I tried it last. The latest tests are with dfx 0.14.1
Yes, confirmed. I’ve just tried @icme 's BTree and it also works.
I’ll be lying if I say I understand what “internal sharing and a recursive type that could degenerate to a linked list” means completely.

However, I am thinking of having all indexes (BTree’s) point to the same object. I suppose that will be putting a reference to the object and not cloning it. I can also put the text key instead, but it will make things slower when trying to filter it. Maybe that’s what you mean by internal sharing and it may be bad?
Anyway I am about to find out soon if it works

claudio · June 20, 2023, 10:34am

Yeah, sorry that was a bit obscure.

The problem is that we use Candid for serialization of stable variables.

In Motoko, in memory data-structures are represented as a graphs. So multiple references to the same object are represented as small pointers to the one object.

Candid, unfortunately, can only represent trees, not graphs, so multiple Motoko references to the same object in memory will get expanded to several serialized copies of that object in Candid data.

If you don’t start with a graph like data structure with multiple references to a shared object, you’re ok. But if you don’t, the size of the data can blow up (exponentially, in fact).

Independently, the overflow can happen because Candid serialization is a recursive algorithm, driven by the (static) type of the data being serialized. If the data has a recursive type, and the value is deeply recursive, serialization can blow the stack (like any recursive algorithm).

It’s not really the data structures fault here, but the fact that Motoko uses an unsuitable format for stable variable serialization. We’d like to fix that, but it’s not easy given all the requirements we need to meet.

The serialization format we use is actually a mild extension of Candid that supports some sharing, but only for mutable arrays and mutable fields. All Motoko references to the same mutable array or field are represented by a unique object in the stable variable format so that we can preserve the identity of those values on deserialization. However, we only preserve sharing of mutable value, not immutable values and it’s unfortunately not trivial to do more than that in the current scheme.

infu · June 20, 2023, 10:42am

Thanks for clarifying. So I should not link to the same object from all indexes, because when it tries to upgrade it will expand too much, and also probably after the upgrade it will end as cloned objects.

I suppose then using Vector to store all data and placing Nat indexes to it in Btrees and Maps will be the best current solution. My only problem with Vector is that I’ll be leaving empty Array cells when someone deletes things, which makes it a bit opinionated.

claudio · June 20, 2023, 10:47am

I suppose then using Vector to store all data and placing Nat indexes to it in Btrees and Maps will be the best current solution. My only problem with Vector is that I’ll be leaving empty Array cells when someone deletes things, which makes it a bit opinionated.

I think that’s one solution, yes. Another (untried by me) would be to wrap each object in a singleton, mutable array and then reference the arrays, not the objects. The arrays will get shared because they have identity. Might be awkward though.

infu · June 20, 2023, 10:51am

Ha, interesting hack. I may try it out. So [var E] ok.

Well, it didn’t fix the Map Map.set(mmm, thash, doc.id, [var doc]); still upgrading errors

I suppose if I want to reuse old Vector cells, I can keep track of them and insert new records there

claudio · June 20, 2023, 11:39am

No, I don’t think this will fix motoko-hash-map unless used internally in the implementation. I was suggesting it more for your own use. I’m also not 100 certain it will help and not push the problem elsewhere.

ZhenyaUsenko · June 20, 2023, 12:35pm

Does the issue persist even with map v7?

…I’ll look into fixing the issue for v8. Will share my findings a bit later

ZhenyaUsenko · June 20, 2023, 12:48pm

To be fair I thought that --rts-stack-pages <n> option introduced in moc 8.2 should’ve eliminated this. @claudio do I misunderstand its purpose? @infu Did you try increasing it?

Topic		Replies	Views
The invocation to the wallet call forward method failed with the error: An error happened during the call: 5: Canister m2kr7-waaaa-aaaah-aawoa-cai trapped: heap out of bounds Developers	3	835	January 26, 2022
Dfx deploy --network=ic heap out of bounds Developers	10	1892	January 28, 2022
Array index out of bounds when upgrade canister Developers	0	535	December 12, 2021
Motoko - Get Canisters Sizes / Limits Developers	10	2185	January 30, 2022
[Discussion] Approaches for preventing canisters from hitting memory limits Developers Discussing	5	610	October 5, 2022

Heap out of bounds ， use motoko

Related topics