Prioritize safe instantaneous canister upgrades

nomeata · November 23, 2021, 4:43pm

The current incident where a problem on one subnet means that possibly, the ledger cannot be upgraded puts a spotlight on a long-standing fundamental issue of our system, and hopefully is a wakeup call. I’d say it’s crucial that developers of an important canister like the ledger must be able to program it in a way to upgrade safely even with outstanding calls.

And we know how to do it:

Change (or extend, for compat) the system API to have named entry points for the response callbacks. A spec proposal for that is floating around the relevant repository for a good while.
Change the ledger canister to use the new interface. This will require not using await, but implement callback handlers explicitly, but for a canister like the ledger this is not a significant hurdle. In fact, it may make the code cleaner and easier to understand, and reduces the risks of await-related pitfalls.

It will likely not possible to use this from Motoko or from rust when using await, and that’s okay - it just must be possible to have safe instantaneous (i.e. no stopping) upgrade for those who can’t afford to have their upgradability at the whims of possibly malicious other canisters.

Yes, its not a sexy feature, and unsatisfying that it’s not compatible with async/await. But still sexier than getting stuck with an ungradedable ledger (and then having to resort to patch the replica to synthesize the responses, as happened before.)

jorgenbuilder · November 23, 2021, 9:02pm

Hear, hear. Seems like something that we should decide to make time for. If it would reduce drag on the process of upgrading the ledger, the sooner it’s done the more time that could be saved. Also, reducing the likelihood of a foundation-backed “full stop” event on a proposal seems quite valuable. Such occurrences are bad PR, and I’m curious to see if the voting turnout is as good the second time around.

jzxchiang · November 24, 2021, 6:34am

Can you explain why named entry points for response callbacks will solve the problem of upgrading canisters with outstanding call contexts?

My understanding is that right now inter-canister updates execute reply and reject callback functions that are stored in a WebAssembly Table, which itself is stored inside the callee canister’s wasm module. These callback functions are looked up using known table entry indexes (I think).

How does using entry points (i.e. ic0.reply_callback and ic0.reject_callback) actually help?

The core problem of:

Some canisters may not be able to make sense of callbacks after upgrades

still isn’t addressed.

Or are you saying that the upcoming work in safe canister upgrades will fix this because now the callback function is part of the callee canister’s Candid interface and can thus be statically analyzed for breaking changes?

akhilesh.singhania · November 24, 2021, 12:42pm

Thanks for the note @nomeata! I agree. I will take an action item to pull up the relevant PR and post it on the forum. I am still hoping that the ic-ref repo will be open sourced soon and then publishing such PRs will be easier.

nomeata · November 24, 2021, 2:09pm

Yes, lots of answers to @jzxchiang 's questions in that repo. Maybe I’ll wait if that becomes available soon, instead of typing long texts on the phone

rckprtr · November 25, 2021, 12:05am

Have the ability for a controller to download the entire state of a canister and analyze/process it.

Hazel · November 25, 2021, 12:41am

^ This please. Can’t stress how important this will eventually be.

rckprtr · November 25, 2021, 12:52am

Pre upgrade can fail, post upgrade can fail, chunking can fail… many ways to lose everything.

akhilesh.singhania · November 25, 2021, 9:02am

We have a feature request for that. I am hoping to be able to prioritise it next year.

nomeata · November 26, 2021, 1:16pm

~~Slightly related: Motoko canisters can currently easily be rendered un-upgradeable by a single trap in a callback~~ (Motoko issue).

False alarm, sorry.

jzxchiang · April 16, 2022, 6:19pm

Was @nomeata’s suggestion ever implemented?

nomeata · April 17, 2022, 4:55pm

No, but with the Interface Specification repo accepting public contributions, we can at least do some steps towards that goal without relying only on DFINITY devs. This PR is a first step, but I still need to write down how named callbacks would look like:

github.com/dfinity/interface-spec

System API: Introduce a canister generation counter

dfinity:master ← nomeata:joachim/canister-generations

opened 11:01AM - 01 Apr 22 UTC

nomeata

+72 -24

## TL;DR A counter bumped on each code installation is introduced. This is on…e ingredient to make safe instantaneous upgrades as described in <https://www.joachim-breitner.de/blog/789-Zero-downtime_upgrades_of_Internet_Computer_canisters> possible. ## What? This change introduces a _canister generation counter_, a simple natural number incremented upon each canister code installation. It is available to the canister via a system api call. Additionally, when handling a response, the canister is told at which generation it has issued the call. This allows the canister to recognize when it is handling a call across a reinstallation or upgrade. ## Why? Reinstalling or upgrading a canister with outstanding callbacks is currently dangerous, as the late response could confuse or corrupt the canister. Canisters that choose to be instantaneously upgradable can now detect such an old response and handle it safely (possibly by simply trapping). The concept introduced will likely be useful beyond this. For example, to observe autonomous canisters it may not be enough to know their _current_ code module hash, but it may also be important to be able to tell that the canister has not had a _different_ code re-installed in between. Observing that the generation counter has not changed is one way to check that no (possibly malferious) code has been installed.

levi · April 17, 2022, 5:23pm

Genius

Is there more to it than passing the name of the callback method when making a call?

nomeata · April 19, 2022, 6:48am

The callback method names would be separate from the public methods (they receive different data, and it would be dangerous if they could be called by others nilly-willy), so it requires a new kind of entry point to the canister. No rocket science, but just needs doing. (The needs doing would be easier to fuel with motivation if I knew that DFINITY would actually plan to implement it, but writing it down may increase their motivation to do it, so we’ll see :-))

Topic		Replies	Views
SNS upgrade canisters with outstanding callbacks Developers	8	505	January 3, 2024
Trying to get a canister update itself (succeeds with an error) Developers	10	816	June 6, 2024
Self-upgrading canisters Developers	7	546	June 26, 2023
Watch out for foot guns with canister upgrades Developers	26	1727	July 24, 2022
The biggest problem with the IC - InterCanister Calls Developers	26	3013	July 28, 2022

Prioritize safe instantaneous canister upgrades

Related topics