Heap out of bounds , use motoko

All blob keys are 32 bytes

This is an issue that is bound to happen. Anyone can deploy a token canister and generate 20,000 transactions and then try to upgrade.

I don’t disagree that what Motoko provides is insufficient and we actually want to improve it in future. But that won’t happen soon.

Given the current state, I can make these observations:

There are a few more stable Tries indexed on Blobs, and each transaction, of which there could be many, stores a sender and receiver Blob, many of which I’d imagine to be shared between transactions (same sender or receiver).

Can you not maintain a two way map from Blob to BlobId internally, and then just use the
small BlobId to reference that particular Blob from other data structures (like a data base table and key). That, would, I hope, mitigate the explosion on upgrade.

If easy to arrange, maintaining the large data in stable memory directly would also avoid the issue, but that’s probably a lot more work.

1 Like

It may work, but it doesn’t solve the problem at all. I don’t think a 32 byte blob is a big overhead, compared to the 32 bytes per int on EVM. This feels like maintaining a database. But there is no database engine in canister, and it is impossible for the subnet to allocate enough computational resources to each canister.
So I would like to make a suggestion.
Be upfront with reality: Canister is first and foremost a blockchain category, it is not a server or a database. It is not the same as EVM, but it is blockchain based, so draw on Solidty’s years of experience and solutions. More support for k-v map structure and restrict full table traversal comparison. There are enough Dapps on eth to prove that most of the business logic that relies on full table matching comparison can be achieved by k-v reading and writing through technical modifications. IC has higher scalability than ETH, so there is an opportunity to support it better.

Mechanisms can be designed to guide developers into a new paradigm. For example, it is possible to allow a controller to deploy a group of canisters on a subnet, and calls between this group of canisters are treated as internal calls, supporting atomicity. Then the developer can design the proxy-impl-data (MVC) pattern to separate the entry, business, and data in different canisters, and in fact the developer just needs to upgrade the impl canister frequently.

6 Likes

After some experimentation with upgrades, I manage to provoke the same error. It is a bug in the deserialization code, which requires more Rust/C stack than it has available as well as a better stack guard.

A fix is under review that should mitigate the failures, if not rule them out.

2 Likes

Although we just released 0.8.2 which mentions an improvement in this area, the second half of the fix is still under review but will hopefully be out in 0.8.3. Just to set expectations correctly.

The PR to watch is this one:

1 Like

Ok, Motoko 0.8.3 is out which I am hoping will fix these particular heap out of bounds issues, at least to the point where deserialization should no longer stack overflow when serialization does not stack overflow.

Note that storing large amounts of data in stable variables (I’m thinking 1.5GB or more) may well fail to deserialize due to memory exhaustion, but that’s a different issue which we hope to tackle shortly.

2 Likes

@claudio @ZhenyaUsenko @skilesare I’ve got a similar problem.
At first, I thought it was my Mac, but then I deployed in the motoko playground and I am getting the same results.

https://m7sm4-2iaaa-aaaab-qabra-cai.ic0.app/?tag=2112554174

repo:
https://github.com/infu/bug_x_weird

You will probably need “Internet Base” VS Code extension so you can run the ./check.blast (from repo) to populate it with 100k records.
These records look like this:

Once they get inserted, the canister won’t upgrade. Sometimes it throws “stack overflow” and on rare occasions “heap out of bounds”. The memory is ~52mb

I’ve tried various dfx versions and also hashmap’s previous version.

I’ve tried using nhash instead of thash (Nat keys). I’ve tried removing skills:[Text].
Tried removing the id from the document as well so it doesn’t repeat.
I’ve also tried inserting the records in smaller portions, 100 per request instead of 1000.
Nothing seems to help with the error. Everything seems to work fine until you try to upgrade it.

1 Like

Taking a look.

Have you tried using ExperimentalStableMemory.stableVarQuery()

to determine the serialized size of the stable variables without actually doing the serialization.
Serialization can sometimes non-linearly expand the size of the source data if it contained a lot of internal sharing.

My other suspicion is that there is some large linked list (or unbalance tree) in the data that is causing serialization or deserialization to run out of stack.

it would also be good to figure out if this happens in pre-upgrade or post-upgrade using dfx (not playground) and some Debug.prints in a postupgrade system method.

Thanks for the tip, I’ll add it to the stats function and see.

I’ve just tried @timo 's Vector with the same record type and managed to get 1.4mil (haven’t tested with more), upgrades work.
canister stats:
“documents”: “1400000”,
“memory_size”: “608043008”,
“max_live_size”: “246481600”,
“stable_size”: “112850245”,
“heap_size”: “246483248”,
“total_allocation”: “359333728”,
“reclaimed”: “112850480”

With the Map, I get “stack overflow” when running the same stableVarQuery that works in the Vector test.
stable_size = (await Prim.stableVarQuery()()).size;

So I think it might be a problem with the motoko_hash_map data structure.

Indeed someone seems to have reported a similar issue "trapped: stack overflow" on canister upgrade on dfx 0.11.2 · Issue #1 · ZhenyaUsenko/motoko-hash-map · GitHub with version 8 of that.

I looked at the code for master and it seems to both rely on internal sharing and a recursive type that could degenerate to a linked list, causing the stack overflow when deep.

Version 7 uses a simpler data structure, I think, so might survive serialization better.

I noticed your dfx.json is using a rather old version of Motoko, it would be worth upgrading Motoko or dfx since we did some work on improving the capacity of deserialization since (see above Heap out of bounds , use motoko - #27 by claudio).

I’ve tried with multiple versions, the repo is using the older one, because I tried it last. The latest tests are with dfx 0.14.1
Yes, confirmed. I’ve just tried @icme 's BTree and it also works.
I’ll be lying if I say I understand what “internal sharing and a recursive type that could degenerate to a linked list” means completely.

However, I am thinking of having all indexes (BTree’s) point to the same object. I suppose that will be putting a reference to the object and not cloning it. I can also put the text key instead, but it will make things slower when trying to filter it. Maybe that’s what you mean by internal sharing and it may be bad?
Anyway I am about to find out soon if it works

Yeah, sorry that was a bit obscure.

The problem is that we use Candid for serialization of stable variables.

In Motoko, in memory data-structures are represented as a graphs. So multiple references to the same object are represented as small pointers to the one object.

Candid, unfortunately, can only represent trees, not graphs, so multiple Motoko references to the same object in memory will get expanded to several serialized copies of that object in Candid data.

If you don’t start with a graph like data structure with multiple references to a shared object, you’re ok. But if you don’t, the size of the data can blow up (exponentially, in fact).

Independently, the overflow can happen because Candid serialization is a recursive algorithm, driven by the (static) type of the data being serialized. If the data has a recursive type, and the value is deeply recursive, serialization can blow the stack (like any recursive algorithm).

It’s not really the data structures fault here, but the fact that Motoko uses an unsuitable format for stable variable serialization. We’d like to fix that, but it’s not easy given all the requirements we need to meet.

The serialization format we use is actually a mild extension of Candid that supports some sharing, but only for mutable arrays and mutable fields. All Motoko references to the same mutable array or field are represented by a unique object in the stable variable format so that we can preserve the identity of those values on deserialization. However, we only preserve sharing of mutable value, not immutable values and it’s unfortunately not trivial to do more than that in the current scheme.

2 Likes

Thanks for clarifying. So I should not link to the same object from all indexes, because when it tries to upgrade it will expand too much, and also probably after the upgrade it will end as cloned objects.

I suppose then using Vector to store all data and placing Nat indexes to it in Btrees and Maps will be the best current solution. My only problem with Vector is that I’ll be leaving empty Array cells when someone deletes things, which makes it a bit opinionated.

1 Like

I suppose then using Vector to store all data and placing Nat indexes to it in Btrees and Maps will be the best current solution. My only problem with Vector is that I’ll be leaving empty Array cells when someone deletes things, which makes it a bit opinionated.

I think that’s one solution, yes. Another (untried by me) would be to wrap each object in a singleton, mutable array and then reference the arrays, not the objects. The arrays will get shared because they have identity. Might be awkward though.

1 Like

Ha, interesting hack. I may try it out. So [var E] ok.

Well, it didn’t fix the Map Map.set(mmm, thash, doc.id, [var doc]); still upgrading errors

I suppose if I want to reuse old Vector cells, I can keep track of them and insert new records there

1 Like

No, I don’t think this will fix motoko-hash-map unless used internally in the implementation. I was suggesting it more for your own use. I’m also not 100 certain it will help and not push the problem elsewhere.

Does the issue persist even with map v7?

…I’ll look into fixing the issue for v8. Will share my findings a bit later

To be fair I thought that --rts-stack-pages <n> option introduced in moc 8.2 should’ve eliminated this. @claudio do I misunderstand its purpose? @infu Did you try increasing it?