Thanks a lot for your posts. We hope to be able to clarify all the questions above – in particular why the introduction of timeouts is not a breaking change – with the explanation below. Also, thanks a lot for your suggestions on next steps to discuss. We like the suggestions but would propose to discuss them in separate threads.
Canister messaging guarantees
The IC protocol provides two guarantees for canister-to-canister messages:
- Request ordering: Successfully delivered requests are received in the order in which they were sent. In particular, if a canister
A
sendsm1
andm2
to canisterB
in that order, then, if both are accepted,m1
is executed beforem2
. - Guaranteed responses: Every request is guaranteed a response (either from the canister, or synthetically produced by the protocol).
(No) Guaranteed request delivery
There are no guarantees regarding successful request delivery: a request may fail synchronously (e.g. if the sender’s output queue is full; or when trying to enqueue an output request when the sender subnet is at memory limit) or asynchronously (if the receiver’s input queue is full; or the receiver is stopped; or frozen; or the receiver subnet is at memory limit). If the receiving canister panics when handling the request (which may also happen in the absence of canister code bugs, e.g. canister exceeding the instruction limit or running out of memory), the effect is similar to the request never having been delivered in the first place: an asynchronous reject response.
One additional potential source of request delivery failure is being added in the form of request timeouts (enabled in the release elected this week). Apart from the exact error message, this will look the same to the caller as a full input queue at the receiving end (or the receiver having trapped; or rejected the request; or being stopped/frozen; or the receiving subnet being at limit in terms of memory). Before timeouts one would have seen errors in these cases; the only difference now is that backlogs are cleared sooner. Nothing changes from the perspective of the canisters – these failure modes needed to be handled before and they need to be handled now.
There was a blog post introducing the concept of one-way messages, which essentially suggests ways to enable ignoring a response. This post also explicitly mentions that one-way calls can only be used if one does not care about potential failures:
“A one-way call is a call where you don’t care about the response; neither the replied data, nor possible failure conditions.”
So again, in cases where one cares about potential failure modes, one needs to implement an explicit confirmation measure for such calls on the other side. Timeouts do not change anything to that. The blog post also suggests that in an example:
“Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly.”
Going back to basics, we also want to note that guaranteed message delivery is impractical for a few reasons:
- It requires infinite resources at the receiving end.
- Even if infinite memory were available, it would lead to arbitrarily high latency.
- Even if a canister would be OK trading off arbitrarily high latency for guaranteed message delivery, there is no way of insulating other canisters sharing the same XNet stream from similarly high latency.