Messaging guarantees

Right. I guess a better question would have been “is this technically possible / feasible”? Additionally, would it help, or would it break other promises that the protocol is based on?

It would definitely be possible technically to time out responses. Problem is, it would break canister messaging guarantee #2:

In the current implementation (and specification) this is interpreted as "exactly one response is produced for any request (whether by the destination canister if possible, or else by the protocol) and that exact response is delivered to the originator.

End-to-end message timeouts would mean the possibility of delivering a different response from the one produced by the callee. And that is a much bigger nut to crack than just timing out requests that never left the canister. Among other things, there are applications that depend on this behavior for correctness: a reject response of any kind is a guarantee that the callee did not process the request, whether that is because it was never delivered, or because the callee trapped while processing it.

Do note though that this is not as solid a guarantee as it may look like: if the callee does a downstream call of its own; then the callee processes two independent messages (the original request and the downstream response); if it then panics while handling the latter, only the mutations made by that are rolled back; any mutations made as part of handling the first message are persisted. And regardless of how careful one is with how they code their canister, the system may cause a canister to panic for something as minor as failing to allocate one byte of heap.

3 Likes

I suspect it’s the status polling message from agent-js. We keep sending the request status message until we get the response from IC. We probably never update the timestamp for the request status. When the call takes longer than 5min to finish, we get this error. But the call is still progressing.

1 Like

As a developer, it is acceptable to not have full messaging guarantee (if it is difficult to achieve 100%).

However, it is important to explicitly list the cases in which the messaging guarantee will not be met. Also, make sure that try… . catch catches all errors.

We are developing ICTC by accepting that the system has no messaging guarantee and introducing the Saga model again at the business layer so that it can be handled by the manager/DAO in case of an exception.
Now the issue encountered is that try… .catch cannot catch all errors, so ICTC will fall into invalidation in special cases.

7 Likes

Thanks for bringing this up, @bitbruce. I think the issue you are mentioning is related to Motoko’s handling of synchronous (not caught by try/catch) vs. asynchronous (caught by try/catch) errors. IIUC there was some discussion in the Motoko team around this issue but I’m not aware of any conclusions. Could you maybe comment @claudio?

We’ve shipped the new ‘async*’/‘await*’ feature in Motoko 0.7.4 to reduce the pressure on the self-send queue when abstracting out asynchronous code.

I’m now actively working on turning synchronous send failures into catchable errors (exceptions) but that’s trickier than I expected so will take another week or so.

If I can’t make that work, some fallbacks would be to set an internal flag and allow the user to test the flag with a primitive and/or record the failure in the async value (thrown eagerly on await), though the latter is no use for one way messages and blurs the commit point.

2 Likes

I wanted to pick up the priority of this named callbacks feature now that the Bitcoin integration work seems to be winding down.

Enable this will further incentivize project teams to collaborate with one another and more easily trust communication with 3rd party canister services on the IC.

A lack of this is bogging down applications and requiring them to devise temporary inefficient and wasteful syn-ack style protocols on top of the IC.

I see the named callbacks feature creeped back onto the roadmap at some point (correct me if I’m wrong here), but it would be great to get a sense of the urgency and prioritization at DFINITY around its delivery.

4 Likes

@icme Thank you for bringing this up again and reinforcing its importance.

I’d like to make clear that we’re quite aware of the issues this problem creates when integrating with 3rd party canisters and we understand it’s blocking certain use cases – this is why we have indeed bumped it up our priority list (if you check out Sam’s last roadmap update, you’ll see it there, it’s listed as “Safe canister upgrades”).

As you mentioned, I expect that now that the BTC integration tail work is wrapping up, the execution team will have the cycles to pick this up soon.

2 Likes

Does the order guarantee extend to the responses as well?

I am looking at the following situation: Canister A sends m1 and m2 to canister B in that order. Both are delivered and executed. Let’s assume B responds during the execution of each them (instead of waiting and responding later). Is it then guaranteed that A receives and executes on the response to m1 before the response to m2?

In general, B could make subsequent inter-canister calls to different canisters while executing m1 and m2, then suspend execution, await the responses of the subsequent calls, and then respond to A only during the continuation. In that case, the responses to m1 and m2 could be generated (scheduled) out of order. But for the sake of this question I am assuming that does not happen. I am assuming that the responses are generated in order. The question is if the responses are then also guaranteed to be delivered in order, just like the calls were.

A second question is if there are any other scenarios than the one given above, which involves subsequent inter-canister calls, that could lead to responses being scheduled out of order.

In practice (and in most situations) responses will be delivered in the order in which they were produced. So in particular, your first example, of two requests, m1 and m2, that each produce responses before triggering any downstream canister calls, the responses will be delivered in the same order. What the implementation does, is that for every successful message execution, all downstream requests and any response are enqueued onto per destination FIFO queues. And since canisters are single threaded actors, messages to a given canister (requests and responses) will be enqueued in the order in which they were produced (not sure about the ordering of requests and the response generated within one message execution).

However: the spec only guarantees in-order delivery of requests, not of responses. So while the implementation currently retains the ordering of both requests and responses, we only provide a guarantee that this will not change for requests. E.g. as part of subnet splitting, which we’re currently working on, we are going to great lengths (and paying a significant price both in terms of complexity and latency) to ensure that requests will still be delivered in order even if the original subnet still has a backlog of requests to deliver while the canisters have already started up on the new subnet. This will not be the case for responses.

Also, in the future, we may consider having separate streams for requests and responses, meaning that the response for m2 may be delivered before a request triggered by m1. Or the other way around. We may choose to deliver responses entirely out of order (e.g. optimize for throughput and deliver smaller responses first). So I would definitely not rely on response ordering for correct functioning of your application.

Honestly, if it were up to me, I would also drop request ordering guarantees. Message ordering (request and/or response) can be trivially implemented by a library and, as said, it imposes significant opportunity costs on the protocol implementation. And I seriously doubt that most canisters require it for correctness. E.g. IIRC the ICP ledger currently relies on it, but it would work just as well with unordered, unique transaction identifiers instead.

Out of curiosity, what’s the use case that would require in-order delivery of responses? I cannot really think of one.

Ok, understood that I can’t rely on response ordering. Might be good to explicitly say it in the spec because some readers may be wondering.

From the perspective of the application programmer (not the one having to implement the protocol) I strongly disagree. We already have very limited guarantees, only the two that you mentioned. And getting asynchronous communication right with only those two is already incredibly hard. I would expect that most ad-hoc written protocols will end up being buggy unless they are formally verified or at least have a proof written down, just because it is so easy to make mistakes. Or, all that has to be hidden in libraries.

I find the ordering guarantee extremely helpful. It reduces the space of designs and edge cases from completely open (like IP) to something ordered (like TCP). I often like to think about canisters as embedded systems (for other reasons, simply because of the resource constraints and the need to think about provable memory bounds). But staying in that picture, the communication between two canisters is then like a wire/serial interface between two chips. The kind of protocols that you write in that context can rely on ordering. Bytes may get lost/skipped but not re-ordered.

I am programming against the ICRC1 interface. My canister is a client of the ICRC1 ledger. Users can make deposits into dedicated subaccounts of my canister at any point in time. My canister calls transfer to transfer out of those subaccounts and then balance in that order. If I have ordering of responses then I can trivially detect user deposits even if they came in concurrently. Without ordering of responses it seems at least more complicated. I will see. I am trying to avoid more rounds of calls (expensive) or waiting with locks (latency, risky).

Do note that if you don’t have full control over the canister that produces the responses, any response ordering guarantees provided by the protocol could be meaningless. That canister may make arbitrary downstream calls of its own (or start doing so following an upgrade) and produce a response arbitrarily late. So it is only in the very limited case of two tightly controlled canisters that you would be able to meaningfully rely on response ordering.

As with guaranteed response delivery, the actual usefulness of message ordering guarantees is limited to a very, very constrained design space. And there are pitfalls aplenty. E.g. even if a transfer request is guaranteed to be delivered before a balance request, in the presence of downstream canister (or system API) calls, it may well be that the latter request is executed before (most of) the former. This would be very much visible if the CDK forced canister developers to use explicit callbacks to handle canister responses. But an await buried deep within some library function is much too easy to miss. AFAICT, these guarantees are more of a footgun to the average developer than anything.

2 Likes

May be worth mentioning Timo, that we are implementing a protocol for withdrawals and deposits. Canisters can inherit this protocol freely using the canister_sdk. We would welcome your feedback. Seems similar to what you are working on.

https://github.com/infinity-swap/canister-sdk/compare/main...maxim/CPROD-1541

1 Like

I got similar error when I redeploy my old project with Rust recently

Installing canisters...
Error: The replica returned an HTTP Error: Http Error: status 400 Bad Request, content type "text/plain", content: Specified ingress_expiry not within expected range:
Minimum allowed expiry: 2023-05-08 08:38:29.045786880 UTC
Maximum allowed expiry: 2023-05-08 08:43:59.045786880 UTC
Provided expiry:        2023-05-08 04:22:21.415577847 UTC
Local replica time:     2023-05-08 08:38:29.045787761 UTC

How to fix that?

This means your system’s time is not in sync with the IC’s time. If you rest your system’s time to something close to correct you should be good to go

:wave: Just wanted to bump this up, as I currently want to integrate with a 3rd party canister but am wary of the risks involved. It’s resulting in me designing a double-oneshot proxy canister, which is a bit overkill when just a simple inter-canister call request timeout would suffice.

Any progress or timeline on the named callbacks feature?

@icme I’ve been meaning to start a forum thread where we can discuss named callbacks and a few other alternatives we’ve been pondering that can address the problem of integrating with a 3rd party canister safely (spoiler alert: I believe the named callbacks is the right step now but I want to make sure we have buy in from the community and that we are not missing another approach that would be more preferable). Unfortunately, I’ve been working on this on and off due to other responsibilities but I’ll try to get it going by the end of next week.

As for timeline: I’m not sure if we can commit to strict estimates given other work that’s on the team and has also been requested highly by the community, but I expect that it’ll be O(months) to get support in the replica (assuming we’re talking about the named callbacks solution here) not O(weeks) so I would suggest to plan accordingly.

2 Likes

Tagging @levi because I know you’ve been eagerly waiting as well :slight_smile:

We have this built in motoko: Assigned: ICDevs.org Bounty #39 - Async Flow - One Shot - Motoko - $6,000

It is also being ported to rust at the moment.

I think this pattern is much safer when building with third party canisters as it segments where your state changes and forces you to think in an async manner. It also provides an economic lever for managing cycles as an ack can come loaded with cycles to pay for the op.(this is nice in some use cases)

I’m not saying we don’t need names callbacks, but hopefully in the meantime this can simplify and standardized development.

Sample: GitHub - fury02/example-async-data-deliveries

@domwoe “eagerly waiting” are the wrong words for what I’m doing.

When this feature comes I will use it in the CYCLES-TRANSFER-STATION.