Scalable Messaging Model

I’m not sure I fully understand your proposal. Are you suggesting a combination of requests with guaranteed delivery and best-effort pub-sub? Or is it something else?

means the outgoing message is small or the response is small? Your post sounds like the outgoing message is small as well (i.e. both are). But doesn’t the problem of wasted allocations only require the response to be small? Why does the outgoing message have to be small as well? I mean scheduling the call can already fail synchronously if there isn’t enough space in the out-queue.

I do experience that in practice frequently that I make canister calls and already know that the response is going to be small, often just an ACK. So any excessive allocations for the response would indeed be wasted.

1 Like

Well, I guess a better first question would be: “Have you all ever considered replica-level Pub-Sub? And if so, why was it ruled out or de-prioritized?”

If the answer is “Oh yeah…that’s in the pipeline” then I’d ask if the target is increasing scalability, why not go down that avenue first as it would likely converge on all implementation scenarios needed for a developer by Guaranteed Messages with a set size that would get a system time out or Guaranteed Messages with a programmer specified timeout. (In the Guranteed case you can handle the time out in-line and in the pub sub, you have to do some application-level bookkeeping to clean up and throw out responses that have timed out).

My guess is that there are plenty of answers to the first question that make the second not worth asking. Sorry for the confusion. I’m a total amateur at the replica level here so I’m sure there are some dumb assumptions and/or impossible/ridiculously hard things that I don’t know are impossible/ridiculously hard.

:point_up_2: Yeah…in most cases I know I’m returning WAY less than 1kb and could easily tell you that before hand(although it might rule out using batch). But if I want to ship a file from one canister to another I’m going to be sending a bunch of 2MB chunks and just getting back the OKs.

What do you mean by “no possible guarantee”? Also, I’m not sure how either topic-based or content-based pub/sub would work, given the ICs sharded nature; it’s not clear how a single subnet would learn of all the recipients interested in a particular message.

We have discussed something a la message queues, namely one-way messages, but this was discarded primarily since it’d be a large change to the current programming model, including the external user interaction (right now both are request/response based). Guaranteed (i.e., exactly-once) delivery of such messages would also likely require a large refactoring of our streams, which would need to become per-canister-pair to be able to implement backpressure. An additional concerns are cycles; first, it can happen that the receiver is frozen or out of cycles, which could violate the delivery guarantees. Furthermore, the receiver might itself implement some kind of rate limiting (for example, the ICP ledger does that, where it will start rejecting transactions if there are too many in some time window) to protect itself from DoS attacks, which could also violate delivery guarantees.

2 Likes

The reason why we proposed a small request-small response approach is that it allows providing canisters with guaranteed incoming and outgoing quotas (whether in terms of messages or bytes). But that only works because we can guarantee each of 100k canisters something up to 100 kB (we settled on 50 kB, with the remaining memory in a shared pool, but it doesn’t have to be 50-50).

If the model allowed arbitrary size requests, we could no longer guarantee a quota, incoming or outgoing. All you’d get out of it would be a small reservation.and fractionally lower costs (in the average case; if the response takes hours or days to materialize, it would be more significant).

And providing some sort of guaranteed messaging resources to canisters (in the same way that they can e.g. rely on reserved storage) was the whole point of the proposal. If your calls fail synchronously, you will pretty much be forced to terminate the call context. Which makes it impossible to provide a reliable synchronous API (make call, receive response) and would force callers to poll for the status of their requests; and callees to use an event loop driven by timers/heartbeats. Some canisters may require the latter, but we believe it should not be the default approach to canister programming.

1 Like

I don’t think we have. As @oggy said, how would one make pub-sub work across subnets? Sure, there may be some way of broadcasting a message to all canisters on the subnet (which is what I understand by “at the replica level”), but doing it at the protocol level would either require subnets to broadcast messages to all other subnets (imagine what that means when/if we get to hundreds of thousands of subnets or more); or maintain potentially huge lists of subscribers and still end up broadcasting to pretty much every other subnet.

Maybe you should consider using best-effort calls for those, since there is no request delivery guarantee anyway. With best-effort calls, no reservation is needed (and a tiny response should be delivered with very high probability).

1 Like

Yep…that makes sense. It becomes an exponential problem if the publishers/subscribers don’t have to register themselves(and perhaps even if they do). In our application-level version, we are requiring publishers and subscribers to register themselves and thus each publisher is assigned a broadcast canister that manages where all the message needs to go. Making this canister subnet aware so it can route to avoid multi-cast across subnets is part of the problem we are facing in keeping cycle costs down.

Makes sense! Thanks to you and @oggy for humoring me. :slight_smile:

1 Like

No worries. That’s why we posted this, to get feedback. And we very much appreciate it.

For a bit more context why we haven’t seriously considered pub-sub, you already know that the IC is (for the time being) significantly bandwidth constrained. Every piece of data used in a replicated computation by a subnet must either already be present in the subnet’s storage; or be inducted through a block, limited to 4 MB in size and a of ~1 block/second rate.

We have ideas for how to work around that (basically have a side-channel for the actual data and only include hashes into blocks, although it’s quite a bit more elaborate than that). But until we make some progress in that area, anything that requires large amounts of bandwidth (such as broadcasting) is a bit out of reach.

This is also one of the reasons why we chose to go with “small messages” as one solution and “best-effort” as another: given limited bandwidth, one can either try to reduce the amount of data being passed around; or, under the assumption of low traffic in the average case, allow for arbitrarily-sized payloads backed up by fair load shedding when necessary.

2 Likes

The proposal looks great, most calls that need guaranteed responses are smaller than 1-kb so this is a great way to balance the response guarantees with the subnets’ memory reservations. I like how this lets each canister have a guaranteed quota of concurrent incoming and outgoing calls.

If a caller makes an outgoing call using the small guaranteed message type and the callee sends back a response bigger than 1-kb, then the callee would change it’s state but the caller will not be able to receive the response and would not be able to know that the callee changed it’s state - breaking the response guarantee. So as far as I can tell the method annotations are a requirement, and if a caller sends a request type different from the callee’s method type, the call must not reach the callee (so the callee can’t change it’s state) and the caller gets an error response. This of course only applies to the small-message-gauranteed-response type. Is this a correct conclusion or are there possible better ways to handle this scenario?

1 Like

We have discussed a number of options here.

The obvious one was to cause the callee to trap when producing a response larger than the reservation. In theory, this would both roll back the callee’s state changes and inform the caller of the failure. Unfortunately, this only works reliably if the callee does not make any downstream calls; if it does, and it persists any state before doing so, then those changes to the state would not be rolled back. At the extreme, this could be used as an attack, causing canisters that do not expect to trap while responding to remain in inconsistent states. Even if that is not so and the callee takes care to roll back all changes (or rather to only apply them when producing the response), it still consumes cycles for zero work done.

The other option was, as you described, to use annotations and/or requiring callees to invoke a system API with the semantic of “I promise to produce a response smaller than X bytes” as the first thing the request handler does. If the annotation or the system API call specify a maximum response size larger than what the caller requested, the call traps. The callee still burns some cycles (if we go with the API approach); or the system must be aware of the maximum response size of every endpoint of every canister before inducting a request; but there is no possibility of a partial transaction being applied by the callee. However, this does require all callees that want to support small guaranteed messages to actively do something about it (i,e, it is opt-in). Which, in the best case, means that it will take a long time before small messages become a significant fraction of the IC traffic.

The final option we came up with so far is to make it the caller’s responsibility to know whether the endpoint they are calling is capable of producing small responses only; or else to deal with the eventuality of a large response. Which would succeed on the callee side (i.e. no trapping, no errors), but a response that was larger than the reservation may (potentially, not necessarily always) be replaced with a specific reject response saying “response too large”. This might be seen as acceptable behavior, since it’s a bad idea to make guaranteed response calls to untrusted/unknown canisters anyway: said canister may “never” produce a response, meaning that your canister is non-upgradeable and has likely locked some resources that now cannot be unlocked. Also, if the endpoint you’re calling is idempotent and you get a “response too large” error, you can retry it as a “large guaranteed response” call; or, if you don’t actually care about the response, a “response too large” reject implies that the call succeeded and that’s all you may want to know. Still, this does mean that we no longer have the clear cut between a response meaning the call succeeded and a reject meaning the call failed (although there already is a lot of nuance there: a response may simply be an error produced by the canister; and a reject could mean that the callee trapped while trying to produce the response, but changes had already been applied).

So yeah, that’s something else that we would appreciate feedback on. Do any of the above approaches seem reasonable? Do you have other suggested approaches that we haven’t considered?

2 Likes

Could you elaborate a little more on the the differences between sending under the small request-small response approach vs the current approach? In my understanding when I schedule a call it first goes into a out-queue which belongs to my canister and which lives in my canister’s memory. In both approaches I am the sole controller of that space and if there isn’t any space then the call fails synchronously. So far there is no difference between the approaches.

Then from there it goes to the subnet and I can be confronted with noisy neighbours on my subnet. What exactly is the difference now assuming noisy neighbours? I believe in the current approach my message will remain in my canister’s own queue indefinitely. And in the small request-small response approach a certain quota of my messages will make it through “fast”. Is that right?

Then at the destination side I believe I have no guarantees because the quotas that we are talking about are not “point to point”. So I could have other canister on my subnet or on a different subnet overloading my target canister and then even the small requests will be delayed indefinitely, right?

But when there is a response then I will get it fast again because of the quota. Or not? I find it hard to imagine any guarantees of quotas when it goes cross-subnet. I can imagine a guarantee relative to the other 100k canisters on the same subnet. But my target subnet could be hammered by traffic from arbitrarily many other subnets.

1 Like

Not exactly. Because messages are always resident in memory (whether they are in canister queues or in subnet streams) we keep track of all message memory utilization across the subnet. So that message’s memory utilization counts against your canister’s memory utilization; but it also counts against the subnet’s memory utilization from the moment you make the call. Actually, if it’s a request, we make a 2 MB reservation for the expected response and that is what we account for as the memory utilization of the request, regardless of how small it is. As soon as some memory limit is hit, your request will fail synchronously

Again, not exactly. If a request makes it into your canister’s output queue, it will eventually be routed into the appropriate stream (unless it times out before that, which happens after 5 minutes). The destination subnet will then attempt to induct it and, unless there isn’t enough memory for it, likely succeed.

Unfortunately, no (I’m running out of ways to start my answers disappointingly (o: ). The subnet cannot guarantee that your messages will make it through “fast”. Even if it did, at 50 kB per canister and 100k canisters, that would be 5 GB worth of messages. If the other subnet had no other messages to induct (no ingress, no calls from other subnets) that would be at least 20 minutes. But since the other subnet may have arbitrarily many other messages to handle, it will take arbitrarily long to deliver even your 50 small message quota.

What this quota guarantees is that you can make the 50 calls without failing synchronously. Not that they will be delivered. Fast or ever.

We are considering introducing bidding for message prioritization (which would actually make it very likely that your message is delivered “fast”, whatever that means), but right now we’re doing round-robin across canisters when routing into streams and some sort of round-robin over streams (with a new randomly picked starting point every time) when including them into blocks.

So basically we have no quota of any kind now and no deadline for delivery. When you make a call, it may fail synchronously, it may time out after 5 minutes if the stream is backlogged; or it may be rejected when inducting it at the receiver end. You are guaranteed to get a response (whether from the local timeout; from the failure to induct it on the other side; execution trapping; or the callee producing a response), but again there cannot be any kind of guarantee regarding how long it takes to deliver it back to you.

With both of the newly proposed models (best effort and small guaranteed response) you would have a quota limited to how many calls you can make before they may start failing synchronously; and how many concurrent calls you can handle before they start being rejected by the system. But you still have no guarantee regarding “fast” delivery, because that is impossible to provide. Bidding on call priority (which would likely apply to both the request and the response) might change that. But even if we implement it, it will be a long way off.

3 Likes

In my current thinking (not final), Option #1 that you mentioned would make the callee dependent on the caller and that sounds like a bad tanglement and like you mentioned, would require all methods to account for a possible (uncontrollable by the callee) trap in the final response message.

Option #3 sounds better than #1 but would remove the guarantee that there will be one globally unique response for this message type, and would remove the guarantee that once the call reaches the callee, the caller will always receive the response produced by the callee.

Option #2 sounds best in my current view. I think the system-level guarantee of the globally unique response for the small-guaranteed-message type where once the call reaches the callee, the caller will always receive the response produced by the callee, is a great help for the canister-writer and a good tool to help make sure the state of both canisters can never be out-of-sync. I think of it in the same way that the system makes sure to reserve the caller’s cycles for a response when an outgoing call is made, to make sure the caller will be able to handle the response.

None that I can think of at the moment, I’ll post if I do.

At least for me, knowing that the “old” (current as of the time of this post) message type will cost up to 1000x more puts a big pressure to opt-in to the new message types. I for one will do so as soon as possible.

1 Like

A great help, yes. A guarantee, no. If the caller traps while handling the response (which may happen through no fault of its own, e.g. if the subnet storage fills up) it will be as if it never received that response. Another way of looking at it is that response delivery is only guaranteed if you can ensure you never trap (which is not under your control). It’s a minute possibility, but it is not zero. Which means it’s not a guarantee, just very likely.

Only the message memory will become more expensive. Initiating the call will continue to be the majority of the cost, as long as your call doesn’t take hours to complete.

1 Like

The canister can reserve the memory it needs in the canister-settings. The stable_grow api returns a result that the canister can handle without trapping. This is within the canister’s control. Are there specific examples where the canister can trap from something outside the canister’s control? If so I want to know about them.

1 Like

It may well be possible to cover all these edge cases (I might not even know all of them). But the fact that you can reserve the memory you think you need is not exactly a mathematical certainty that you will never run out of memory. E.g. every request in your input queue currently makes a 2 MB reservation in your output queue. So if someone floods you with requests, your canister memory usage could easily (temporarily) jump by tens of GB. Considering that a subnet only provides a total of something like 600 GB of storage (I don’t remember the exact number), this means that only a few dozen canisters per subnet could even reserve sufficient memory to prevent this eventuality, But subnets support up to 100k canisters.

Similarly, you can always run out of cycles. Yes, you can go to great lengths to avoid it, but it’s not precisely a guarantee. Not in the same sense that e.g. atomic message execution; or Ethereum’s atomic transactions are.

To put it differently, I would think twice about trusting my life savings to a blackholed DeFi app (i.e. one that could not be upgraded and did not support any form of manual intervention) that relied exclusively on guaranteed responses for correctness / consistency.

1 Like

Not disappointing at all. I knew my understanding was wrong and I am grateful to have my misunderstandings corrected.

So in summary, the difference the current behaviour and small request-small response approach lies in less synchronous sending failures, respectively even guaranteed no synchronous failures up to the quota.

In my thinking I was just mislead by taking the viewpoint of a single canister. When you take 2MB each for 500 messages you get 1GB and that appears small relative to heap memory limit (4GB) and stable storage limit. But of course if 100k canisters do the same thing then the story is different. 100k canisters cannot all allocate 2MB for a message at the same time. But they also cannot all have a few MB of wasm memory. The limit in both cases is subnet memory.

So is it fair to say that if a subnet runs into synchronous message sending failures due to subnet memory limits then its canisters will likely also run into memory grow errors? Because they are both caused by the same limit?

1 Like

One more question here. I am not arguing against the proposed new approaches, just trying to understand the rationale in detail.

Here, I think you mean the situation in which a canister receives a call and wants to make a downstream call which fails synchronously. Now the canister has to terminate the call context (respond) because there is nothing else it can do. A retry does not make sense if making the call just failed synchronously. Now the caller has to retry later. But with the small request-small response approach I can still have asynchronous failures. So what exactly is gained by it? The situation seems to be the same. Or is the difference that after an asynchronous failure it makes sense to retry so the canister does not have to terminate the call context right away?

I have the feeling I did not understand your scenario or who the caller/callee are and what the call context is. Can you elaborate please?

1 Like

This is all fine in the trivial case where your canister acts as more or less of a proxy to a single downstream canister (maybe also doing something on the side, such as recording the payment, if that’s what it is). But if you e.g. consider the case of a DEX or any canister that makes multiple downstream calls in a multipart transaction; and the first call succeeded; and now the second call fails synchronously, you will be forced to essentially commit half a transaction. (Or fall back on timers and asynchronous APIs.) Whereas if the second call fails asynchronously, you have the option of retrying until it eventually succeeds. And still provide a straightforward synchronous API.

Yes, if the subnet runs into one of its resource limits and your canister does not have an allocation for said resource (e.g. you can allocate storage but, for now, you cannot allocate message memory), then whatever it was you were doing will either return an error (if it’s some explicit API) or trap (if you’re allocating memory on the heap).

1 Like