Right now, if pre_upgrade fails for whatever reason, your canister is stuck even if you try to push new canister upgrades. Only option is to do a complete reinstall. Which will WIPE ALL STATE. Which is not ideal, to say the least
Today, if your app holds any amount of significant user data, upgrades are a scary and gut-wrenching experience. Even with precautions and the stringent conventions being followed, it takes a simple slip up of making changes to memory layout of existing data structures to corrupt your entire app state.
What Iām proposing is introduction of another hook, something like try_pre_upgrade
that runs the logic in the hook. If everything works as expected, it will behave exactly like pre_upgrade
currently does.
However, if thereās a panic during the pre_upgrade
step, it will simply revert to the last working checkpoint exactly how panics during update calls behave right now. I imagine this will require minimal changes to the platform since youāre borrowing behaviour from update calls on how to handle panics.
Additionally, add another upgrade mode to the existing upgrade
, install
, reinstall
called lifecycle_upgrade
that lets us upgrade only the lifecycle hooks in canisters while retaining their current heap. I imagine this can also borrow implementation logic from how canisters are put to sleep and woken up on the execution layer. Something around saving the entire heap memory contents to stable storage.
I imagine this entire feature can be incremental without requiring breaking changes to the existing APIs.
Thoughts?
@ulan
Tagging a couple of other devs who Iāve recently chatted with regarding canister backups and upgrades.
@senior.joinu
Bruce from @AstroX
@icme
P.S This request is born from a real life use case for us at Hot or Not. I recently had to upgrade a canister with around 51 GB of memory used. The back story involves an implementation that was leaking stable memory. So, trying to migrate the data to heap memory required us to use the ic_cdk stable APIs. However, what we were unaware of was that those API are actually only limited to using 32 bit internal API since the heap maxes out at 4GB, thatās fine. However, since our canister had breached those sizes, our preupgrade was failing without any way for us to recover that canister ultimately leading us to reinstall the entire canister.
Horror story that we hope to never have to repeat