Hey there!
I’m implementing a stable BTreeMap for ic-stable-memory and there is a problem. ic-stable-memory allows developers to react to OutOfMemory exceptions programmatically. This is important, because it gives an opportunity to continue auto-scaling horizontally, once you’re out of stable memory.
This exception is designed in such a way, so when you catch it, you’re sure that there were no side-effects - the change you was going to make is reverted completely and the stable collection you’re working with is operable (yes, you can’t insert more values, but you’re able to delete and search).
Up until now this feature was easy to implement, since I only worked with array-based collections (e.g. vec, hashmap, binaryheap). But BTreeMap is different - all of it’s algorithms are recursive and I have no ability to go back down the stack and revert changes, once the OOM is encountered on my way back to the tree root.
I was thinking on a possible solution, but everything I can do on my side does not seem acceptable:
- I could implement my own stack and use it in order to revert changes, but it looks too complicated, since I would have to implement some way of determining whether the change was made to a node during this updating session and what was this change exactly;
- I could give up on side-effects for this collection, but that would mean that when such an exception occurs, the collection might broke which will lead to data loss (and memory leaks);
- (my current solution) I could give up on programmatic error handling support for this collection and just
trap()
each time the canister can’tgrow()
more stable memory; this would make the collection free of side-effects, but the user won’t be able to execute any logic after such an exception - so no automatic horizontal scaling.
I understand, that for now I’m maybe the only one person who is bothered by this, but anyway.
There are things the Foundation can do in order to help me with to resolve this issue:
Proposal #1 - stable64_can_grow(u64) -> bool
This is a function that receives an amount of bytes as an argument and returns a boolean value indicating whether or not this canister can call stable64_grow()
with the same argument and be sure that there will be no error returned.
I could use this function in order to determine if it would be possible to allocate enough stable memory for the worst case scenario (all nodes of BTreeMap are already full - insertion will lead to an allocation of logN new nodes). And if it’s not - I will just throw OOM as a preventive measure.
This is not ideal, but this will work.
Yes, I can use current stable64_grow(u64) -> Result<u64, OutOfMemory>
in order to make this check, but this way the canister will always have some unused (but paid) stable memory, which is not great.
Also, having such a function may actually be a good thing, since there is no stable_shrink()
function and sometimes you only want to check if you can have this memory, but to not necessary start paying for it immediately.
Proposal #2 - trap_handled(u64, &str)
function and post_trap(u64)
canister hook
Additionally to existing trap(&str)
function I propose to add a new trap_handled(u64, &str)
function that also receives a user-defined error code. When such a function is called, everything behaves the same as with the common trap()
: the canister’s state is reverted, the client-user (message sender) receives an error response; BUT a special canister hook post_trap(u64)
is invoked with the same error code as was passed to trap_handled()
.
A developer can define some custom error handling logic in this hook: from logging the error to another canister, to scaling horizontally (like in my case).
This would solve my issue completely (and, I assume, this would solve any “revert all changes, but do something else instead” issue there is), but it looks much harder to implement.
What do you think? Please, share your feedback. Not calling for any action, just a discussion.
Linking only @domwoe, since it looks like a topic for you (because of the working group).