Right now, if pre_upgrade fails for whatever reason, your canister is stuck even if you try to push new canister upgrades. Only option is to do a complete reinstall. Which will WIPE ALL STATE. Which is not ideal, to say the least
Today, if your app holds any amount of significant user data, upgrades are a scary and gut-wrenching experience. Even with precautions and the stringent conventions being followed, it takes a simple slip up of making changes to memory layout of existing data structures to corrupt your entire app state.
What I’m proposing is introduction of another hook, something like
try_pre_upgrade that runs the logic in the hook. If everything works as expected, it will behave exactly like
pre_upgrade currently does.
However, if there’s a panic during the
pre_upgrade step, it will simply revert to the last working checkpoint exactly how panics during update calls behave right now. I imagine this will require minimal changes to the platform since you’re borrowing behaviour from update calls on how to handle panics.
Additionally, add another upgrade mode to the existing
lifecycle_upgrade that lets us upgrade only the lifecycle hooks in canisters while retaining their current heap. I imagine this can also borrow implementation logic from how canisters are put to sleep and woken up on the execution layer. Something around saving the entire heap memory contents to stable storage.
I imagine this entire feature can be incremental without requiring breaking changes to the existing APIs.
Tagging a couple of other devs who I’ve recently chatted with regarding canister backups and upgrades.
Bruce from @AstroX
P.S This request is born from a real life use case for us at Hot or Not. I recently had to upgrade a canister with around 51 GB of memory used. The back story involves an implementation that was leaking stable memory. So, trying to migrate the data to heap memory required us to use the ic_cdk stable APIs. However, what we were unaware of was that those API are actually only limited to using 32 bit internal API since the heap maxes out at 4GB, that’s fine. However, since our canister had breached those sizes, our preupgrade was failing without any way for us to recover that canister ultimately leading us to reinstall the entire canister.
Horror story that we hope to never have to repeat