Scalable Messaging Model

Background

The IC’s messaging model (as of January 2024) is conceptually an attractive proposition: remote procedure calls (RPCs) with reliable replies and automatic call bookkeeping through call contexts. Unpacking it a bit:

  • Canisters interact via RPCs
    • A ⇾ B request (message); followed by B ⇾ A response (message).
    • Every call being handled (Motoko shared function; or Rust update function) is backed by a call context.
  • Best-effort requests
    • With ordering guarantees: if canister A makes two calls, c1 and c2 (in that order) to canister B, then c2 will not be delivered before c1.
  • Guaranteed responses
    • There is globally exactly one response (reply or reject) for every request.
    • This response is eventually delivered to the caller.
  • Backpressure mechanism
    • Canister A may only have a limited number of outstanding calls to canister B.

Problem Statement

The long term goal of messaging on the IC is to ensure that canisters can rely on canister-to-canister messaging, irrespective of subnet load and with reasonable operational costs.

This goal is impossible to achieve with the current messaging model; to the extent that there were already discussions about increasing prices for messaging on the IC. These discussions were paused to take a step back and see whether there are any other variables that could be tweaked to achieve the goal. This post presents the results of these discussions.

The reasons for why messaging on the IC doesn’t satisfy the long term goal are the following:

  • Guaranteed response delivery implies unbounded response times. Concrete examples where this is a problem include calling into untrusted canisters, safe canister upgrades, and responsive applications in general. It also makes (true) fire-and-forget type of messages impossible. Later – once the IC supports more diverse subnet topologies – calling into dishonest subnets will also become a problem because of these guarantees.
  • Relatively large upper bound on the size of requests/replies (2MB while the mean message size observed on mainnet is 1kB). In combination with guaranteed replies, this requires reserving orders of magnitude more memory than necessary in most practical cases, increasing costs both to the canister and to the system.

Proposal

We propose to extend the current messaging model in two directions: small messages with guaranteed responses, and best-effort messages. The extensions will require explicit opt-in from canister developers so that backwards compatibility is maintained. Canisters that do not take any action will simply keep sending messages with the current semantics and guarantees.

Small Messages with Guaranteed Responses

Small messages with guaranteed replies have the same semantics as existing canister-to-canister messages, except for being limited to 1 kB payloads. Besides being significantly less resource-intensive, the size restriction opens the possibility of ensuring every canister a quota of messages and thus a much more predictable environment. The current thinking is that it should be possible to give every canister guaranteed quotas of 50kB for incoming and 50kB for outgoing small messages that can be in flight at the same time, plus use of an optimistically shared pool of 5GB*. (Note that the guarantee to be able to produce an outgoing request does not change anything to the fact that delivery of requests is best-effort.) Initially small messages’ payloads will be limited to 1kB (50 incoming, 50 outgoing, 5M shared optimistically), but given demand this can be made more flexible later.

Small guaranteed-response messages still have the issue of potentially unbounded response times, but this may be an acceptable tradeoff in certain situations.

* We assume an upper bound of 100k canisters per subnet. More would only be reasonable if they are part of one or more canister groups. At that point, per-canister quotas are no longer so important.

Best-Effort Messages

For best-effort messages, both request and response delivery would be best-effort, which opens up the possibility for the system to ensure fairness even in high load scenarios via fair load shedding. Because the system may drop requests or responses under heavy load, memory reservations for responses are unnecessary. From a canister’s perspective every request still gets exactly one response. But the system does not guarantee that it is a globally unique response (e.g. the callee may produce a reply while the caller sees a timeout reject).

This means that canister developers who choose to rely on best-effort messages may have to handle the case where they can not infer from the reply whether the callee changed its state or not. In other words, best-effort messages allow developers to make a choice between bounded response times and the (potential) requirement to handle this additional case.

Additionally, every call has a deadline, set explicitly by the caller; or implicitly by the system. And when a response does not materialize before the deadline (whether because the callee did not produce one; or because the response got dropped) the subnet always generates and delivers a timeout reject response.

Similarly to small guaranteed-response messages, canisters would be guaranteed a quota of 50 concurrent best-effort calls, complemented by an optimistically shared pool of 5M calls.

How to use the new message types

Canisters will continue using the async-await programming model, with no structural changes required. Switching between “guaranteed response messages ”, “small guaranteed-response messages” and “best-effort messages” could be as simple as e.g. defaulting to “guaranteed response messages” (for backwards compatibility) and calling with_size_limit(1024) or with_timeout(duration), respectively, when building a request; or setting a global flag for all of a canister’s outgoing calls (details TBD).

There will likely also be a need for global flags or per-canister-method annotations to signal which message types a canister is willing to accept (details TBD).

What’s in It for Me?

A predictable environment in terms of messaging, responsiveness, scalability, fairness, upgradeable canisters, safe interactions with untrusted canisters and malicious subnets. Eventually, sensible retries.

In a bit more detail:

  • For both new message types, per-canister quotas ensure that canisters can operate in a predictable environment in terms of how many messages they can reliably send and receive.
  • For best-effort messages we additionally have that:
    • Subnet-generated timeout rejects ensure bounded response times; make it safe to interact with untrusted canisters and malicious subnets; and allow canister upgrades at all times.
    • The lack of response reservations allows applications to handle orders of magnitude more concurrent calls; reduces storage use; and reduces the overhead of simulating one-way messages.
    • The possibility of dropping messages allows for graceful degradation under heavy load and provides the opportunity for fair load shedding (e.g. when a canister uses more than its fair share of XNet bandwidth).
    • Callers’ ability to enforce deadlines means that (within the context of best-effort calls) it would be acceptable for the system to provide a sleep primitive. Providing sensible retries with back-off.

Conclusion & Next Steps

The new extensions to the messaging model will provide an environment canisters are able to rely on, and, hence, make it easier to implement reliable and/or consistent cross-canister applications.

We will keep working on the interface details and follow up as soon as they are worked out. With that in place, the goal is to start working on a first iteration towards an MVP in the replica; and expose it in Motoko and the Rust CDK.

Further work can be prioritized based on real-world needs: besides completing the vision outlined in this post, we believe that there will be demand for fair load shedding; a sleep API; rejecting pending calls on stop; dropping the payloads of expired messages from blocks; bidding for message priority; etc.

26 Likes

This is a really good improvement! Having to reserve 2MB for every single call is definitely not feasible long term.

Also, this is how all web2 services work so developers will already be used to having to retry failed requests in an idempotent way.

In OpenChat we will switch to using “best effort” messages for almost everything.

I guess we may use “small guaranteed-response messages” for calls to the ledgers, but even then it may be better to use “best effort” because we have integrated with many tokens whose ledgers could potentially become malicious and prevent us from upgrading.

10 Likes

@icme @lastmjs @levi @skilesare @saikatdas0790 Please have a look at this proposal.

Note that this is considered also an alternative to named callbacks to ensure safe canister upgrades.

4 Likes

Note that best-effort calls would not make named callbacks entirely superfluous, but they would cover quite a few use cases.

More specifically, they would only guarantee safe upgrades for canisters that rely exclusively on best-effort calls (or only make guaranteed calls to “fast” endpoints of trusted canisters, however one would define that). This is because best-effort calls come with a hard upper bound on response latency, so once a canister is Stopping, all its outstanding calls would be guaranteed to have a response within this (TBD) time bound.

The other important point is that a canister can still shoot itself in the foot if it simply retries failed calls forever: a canister in the Stopping state can no longer receive calls, but is still allowed to make downstream calls; so if the canister simply follows up every timeout response with a new call, then it will never stop. A reasonably implemented canister would not have this issue, but it might still retry failed requests for some limited time (or some limited number of times), which would correspondingly extend how long the canister takes to stop. So as long as there exists a limit on retries, there’ also exists an upper bound on canister stopping time.

Finally, a more speculative idea might be to allow canisters that can deal with this to opt into an “instant stop” mode, where all their outstanding best-effort calls are immediately aborted (i.e. they receive something akin to an early timeout reject) when the canister transitions into the Stopping state; and the canister is prevented from making any more downstream calls. Such a canister would only require a couple of rounds to handle all those abort responses, after which it would stop.

Named callbacks OTOH would implicitly allow for instant upgrades; and would also apply to guaranteed response calls. Which would clearly make them the superior option. But they would also require rewriting canisters to use explicit callback functions, as opposed to the nice (and potentially misleading) async-await structure that is most widely used.

7 Likes

:clap:t4::clap:t4::clap:t4:

Looking forward to be able to utilize this

1 Like

There is a bunch in this post across a wide range of topics…sorry for the length…let me know if anything isn’t clear.

For some context, we’ve been working on a bunch of ‘batch by default’ and batch standards for Fungible and NFTs. Once these are out, I’d expect that the average message size may significantly increase as wallets, marketplaces, etc begin batching their requests. (An NFT market can now pull the whole list of NFTs for a collection and then make a request with all those IDs to a metadata endpoint and expect to get back, depending on the collection, a large response. So fewer requests, but bigger payloads.)

It is probably just a variable to stick into our calculus, but hopefully, we are developing some patterns that will extend to much more complicated standards than just Tokens.

Also, Question: Is the 1kb limit for both incoming payload and outgoing payload?

We have metadata variables that we give out on the token canisters like ICRC4:max_balance_batch that are there to specifically restrict users under the current 2MB limit. Will there be complications where the client doesn’t know what kind of response the server has implemented? Will the candid(did) expose it? What if one ICRC4 canister uses Small Message with Guaranteed response, but another uses Best effort and supports up to 2MB sized incoming batch…I guess phrased another way, how will canister clients know which method to use and/or are we going to have to go back and add stuff to existing ICRCs to handle this.

Oh man…this makes my head ache a bit trying to think of all the ways this could go sideways for folks that don’t know what they are doing, but I’d imagine some well-developed patterns would help here. We are already handling some deduplication on the ICRC canisters and I guess this pushes us to move to some kind of request-id generated client side(or deterministic key as is the case with ICRC1/2/4/7/37 transactions. It feels like getting back to a failed response might be a tough one. If the output was written to an ICRC3 log and dedup works, then hopefully you get a nice duplicate-of response. Still, you’re really going to need to make sure all relevant data is in that log in order to serialize it back to an expected object and inject it back into your processing pipeline.

Much of that feels like it leads to some ‘code smells’, but I guess you get to select this option intentionally. My concern is for the old pathway…will it get more expensive as we move forward if we don’t opt into these new modes?

I’d be interested in @dfx-json, @claudio, @luc-blaeser, +rest of motoko teams, though on how this would actually look in motoko.

    //calling
    let foo = await.with_size_limit(1024) myactor.transfer(...);

   //declaring
   public guranteed(msg) transfer(...) : async Bool{
   };

   or

   public shared(msg) transfer(...) : async.best_effort Bool{
   };

   or something else(we don't really have decorators yet).

We have been solving for these issues at the application level and have an alpha of a system that isn’t really ‘best effort’, but that assumes an event-based programming including archiving, replay, etc. It is specifically designed let canister ‘publish’ events to a trusted canister and not have to worry about any of the ‘untrusted’ stuff. The Broadcasters do everything via one-shots to subscribers and don’t wait for responses. If a subscriber is off-line it can catch up later by checking the nonce of the event stream.

The one thing it isn’t super good at for now is subnet awareness and/or optimizations, so it is possible to do something expensive like send an event to 100 subscribers on a different subnet instead of relaying to a broadcaster on the subnet and having it distribute to the 100 subscribers. I was hoping to get to that after the alpha.

It lays the foundations of some other cool features like cycle-neutral messaging, tokenization of data streams, etc. Given all of that and some grand designs that I may never actually have time to build…I would have actually loved something like this at the protocol layer.

Ethereum has events, but your contracts can’t respond to them. This event system fixes that glitch and lets you write programs in an event messaging style. When you do that you don’t have to stress about ‘what happens if I miss a message’ or ‘did the canister get it and do an update that I don’t have access to’ because you just assume that architecture from the beginning.

module {

   let handleEvent = EventHandlerHelper([
        ("icrc1.transfer", onTransfer),
        ("icrc2.transfer", onApprove),
        ...
   ]);

   var lastTransfer = 0;
   var lastApprove = 0;

    public shared(msg) handleEvent(eventId: Nat, publisherId: Principal, eventName: Text, payload: Candy.CandyValue){ //oneshot
    handleEvent(eventID, publisherID, eventName, payload);
};

   private func onTransfer(eventId: Nat, publisherId: Principal, eventName: Text, payload: Candy.CandyValue){
     if(eventID != lastEvent+1){ //this won't work if you use a filter at the event system level
        let missedEvents = eventSystem.catchUp(eventName, lastTransfer+1, eventID);
        for(thisItem in missedEvents){
          onTransfer(thisItem..);
        }
     };
     //transfer logic
 };

///etc

};

It certainly adds some state requirements and probably isn’t conducive to implementation in the replica, but I’d much rather be able to do the following then to have to write a bunch of retry code inline with the classic async/await approach:

   public event({caller, msg}) some_namespace_schema{
     //handle the event
   }; //and even better if the replica/motoko took care of making sure I didn't miss messages somehow.
5 Likes

With “best effort” and batch methods in icrc4 and 7, I suppose the transfer request deduplication will become more crucial to handle cases where there’s a timeout.

Should we consider supporting both batch and non batch methods in icrc7 where the first uses guaranteed response and latter uses best effort? Similar could be done with icrc1 and 4 canisters.

The 1 kB limit would apply to both the request and the response payloads, yes. Just as the existing 2 MB limit does.

We have discussed the possibility of having arbitrary small message limits (hence the 50 kB quota as opposed to a 50 small message quota); as well as allowing canisters to reserve larger quotas (so e.g. with a 1 MB quota you could handle 20 concurrent 50 kB messages). But this is all very speculative and will likely require significant time and effort to achieve. And, depending on popular demand and whatever bottlenecks subnets run into in the future, they may well end up being implemented after other features that we currently labeled as “future work” (i.e. low priority), e.g. dropping expired payloads from streams; fair load shedding, ingress history size limits, etc.

All very good questions that I do not have a definite answer to. SDK / Motoko people and the community should get to decide how to best address them.

My very evasive answer is that in general callers would be expected to know what canister they are calling into and what its limitations are (i.e. does it support small responses? best effort?). Standards should explicitly define what’s supported and what isn’t. And for standards completed before the new message types are released, I would say that the only reasonable expectation should be that they can handle the old 2 MB guaranteed response calls.

Another perspective may be that if a canister’s API (or implemented standard) is idempotent, then it should be safe (within limits, such as the ICP ledger’s 24 hour deduplication window) to use best-effort calls. If there’s no idempotent API, then one must assume that retrying a timeout response could result in a double spend; with the implication that the respective canister only supports (small or large) guaranteed calls.

The truth is that guaranteed response calls cannot be made to work reliably. Even with small guaranteed response messages, if the subnet is badly backlogged your canister may end up with an output queue full of responses (whatever the quota) and, as a result, be prevented from receiving more calls. In a very real sense (both in terms of “can I make / receive a call now” and “will I ever receive a response to my call”) guaranteed response calls are best-effort: you are guaranteed to get a response, but it may not be what you want and it may “never” materialize.

OTOH traditional systems have been relying on best-effort messaging (TCP, UDP, HTTP) forever and people have built reliable, available, consistent distributed applications on top of them. These were not “folks that [didn’t] know what they are doing”, but it is entirely possible. You just have to start from the right set of assumptions.

It may well do so. As mentioned at the top of the thread, we already suggested increasing message memory fees by ~1000x and were about to do so, before we took a step back to reconsider our options. In reality, this 1000x increase would have been experienced as a much less than 2x increase as long as calls completed in reasonable time. But message memory is a much more limited resource than storage in general (messages are always loaded in-memory, routed, checked and inducted), so a price increase once an alternative exists cannot be excluded. Which is where a “small guaranteed response message” may come in handy, even in cases where the call takes a lot longer than what e.g. a reasonable HTTP request may take.

Ethereum has a lot of features (such as atomic transactions) that a sharded network could never provide outside of very narrow use cases (e.g. atomic single message execution). This is because Ethereum is a single (and single threaded?) virtual machine, so most of these things are trivial to achieve. But it also means that Ethereum is very, very tightly limited in terms of scalability. We had someone on the team look at messaging models across Ethereum rollups and a bunch of sharded blockchains as part of coming up with a proposal and the findings were (to put it mildly) not encouraging.

You may be able to implement reliable event notifications in a very, very tightly controlled environment, but a general solution is IMHO impossible, since it would be equivalent to guaranteed request delivery. Regardless of volume, load, number of participants, number of virtual machines.

So while I fully agree with the sentiment (it would be very nice to have guaranteed message delivery, atomic transactions and high throughput all at once), I don’t think that is even vaguely realistic.

2 Likes

Quite possible. Although, as said, it may take a while for all of the above to materialize and it may do so out of order. In the meantime it is entirely reasonable to stick with the existing 2 MB guaranteed response calls. And maybe consider making allowances for future support of those messaging models that make sense for the standard.

1 Like

Out of curiosity has the team looked into Radix? They claim to have a come up with an approach which allows to scale the network’s throughput linearly by adding more shards while still preserving atomicity and composability even across different shards. Their whitepaper is quite technical and has been supposedly peer reviewed, I’m not an expert on the matter so most of it is beyond my understanding, but I’d be curious to know if you already took it into consideration and whether there is any merit to it.

This is the first time I’ve seen this limitation mentioned in relation to named callbacks. Aren’t there any alternatives that would allow us to continue using the async/await pattern? Perhaps the CDK could perform some compile time magic to abstract this process?
Regardless, I still believe that named callbacks are an essential part of the puzzle. Being able to safely interact with third party services, irrespective of the messaging type, is crucial for composability.
Even if the limitation can’t be avoided, it’d still better than not having them at all, we could even view this positively, as explicit callbacks might make some potential reentrancy issues more noticeable.

3 Likes

Not that I know of. But I did take a quick look at their quite useful infographic series. At first look, it does seem as if their approach scales linearly. They have something like UTXOs to hold everything from tokens to smart contracts with their data. And every such UTXO ends up in its own separate shard. There is always a transaction to create the UTXO and one to consume it (and potentially replace it with a slightly different UTXO that ends up in a different shard).

On paper it looks quite reasonable, as they can pull together any number of UTXOs into one atomic transaction. However, I can imagine that if we’re talking a UTXO cintaining even a few GB of state (say the equivalent of an IC canister and its state), any mutation will result in a new, slightly altered copy of said UTXO / canister being created in a new shard. And the old shard is never deleted. I.e. such a transaction would be agonizingly slow; and would result in another few GB of state that must be persisted forever.

My personal conclusion (that may well be flawed) is that while their protocol does scale linearly with the number of nodes, it is only realistically usable for maintaining a ledger (or other small chunks of data) and it actually requires linear (or likely higher) growth only to manage the infinitely growing set of UTXOs. Without continuous growth, it all just falls over.

No disagreement from me there. (o:

Also agreed. The async-await model is too misleadingly simple for its own good. But to be honest, explicitly carrying over state from callback to callback is not trivial. Even though I was already a solid software engineer at the time, I remember the confusion and annoyance of dealing with explicit callbacks in my first Big Tech job. So I can see how they could be exceedingly difficult for someone just getting started with coding.

6 Likes

Yes their protocol is mainly focused on providing a scalable layer for DeFi, rather than serving as a general purpose crypto cloud like ICP.
Could something similar be considered in the future by introducing the concept of DeFi canisters/subnets with a different set of tradeoffs from regular ones?

1 Like

Hard to say. If you were to implement something like this with canisters instead of physical nodes, my knee jerk reaction would be that latency would increase by at least one order of magnitude (hundreds of milliseconds roundtrip for geographically distributed nodes; multiple seconds roundtrip for canisters across multiple subnets). So if their claim of 5 second finality is accurate, you could expect something like 1 minute finality if you were to implement it with canisters. Not great.

You could also (presumably) have a subnet that essentially runs Radix underneath but somehow exposes the same interface as an IC subnet. But at that point, why not simply have canisters on the IC interact directly with the real Radix? (Similar to how ckBTC interacts directly with Bitcoin.)

3 Likes

Sounds good. DeFi & ledgers are probably the ones that rely on guaranteed responses the most and generate most of the calls right now.

How about - if a canister utilizes 300 simultaneous calls over a period of time (weeks) its quota raises to 300. Then someone who tries to exploit this by flooding the network to cause others canister calls to fail, will not be able to do that easily.

Perhaps CDKs should put some kind of test mode where 30% of the calls artificially fail throwing all possible errors to simulate work under heavy load. Otherwise, we will only find out how good our ecosystem and our code are, when under attack.

3 Likes

Heartbeat and Timer callbacks are not considered messages correct? These limitations wont apply to them and they are still guaranteed? Also calls to the management canister aaaaa-aa are not messages?

2 Likes

As currently envisioned, you would get a guaranteed quota of 50 concurrent calls; plus access to an optimistically shared pool of 5 M concurrent calls. So you would be far from limited to 50.

One possible future extension might be to allow reserving chunks of that 5 M concurrent calls pool, similar to how you currently reserve storage. Or introducing some form of canister groups, within which the 50 call quota of each of N canisters in the group would turn into a N*50 calls quota to be shared by the group (which would presumably consist of canisters designed to collaboratively share resources). A dynamic quota based on historical usage may also work, but it could e.g. prevent you from handling a spike in traffic.

That is a wonderful idea. We should definitely consider something like that. You can bring it up as a separate forum topic; or I can do it internally.

Heartbeats and timers are indeed system tasks, not messages. The only resource they would use (in terms of what we considered as resources in our proposed messaging model) is one call context. But there can be at most one call context per canister without callbacks, the one currently executing (whether it is triggered by a message or a system task). And only while the canister is executing something. Which means that we can safely ignore it in the big scheme of things. So yeah, no limitations on timers and heartbeats,

Management canister calls are in fact messages. We have a separate set of input and output queues for the management canister, just as with regular canisters. The management canister itself is not really a canister, it’s just functionality provided by the system, but in order to support asynchronous operations (e.g. canister installs) it looks for all intents and purposes the same as a canister.

2 Likes

Not sure it needs a separate forum topic. I would just add - The test mode has to be placed inside the wasm so it can be put on IC mainnet. Placing it in local replica won’t work for us. Most of our canisters are calling basically hundreds of different canisters on the IC. It’s nearly impossible to replicate everything locally to test things out.

2 Likes

I made it a feature request over here on the feedback board. I suspect such a capability would be put in PocketIC and not into the CDK so it is not limited to one specific CDK

1 Like

That’s great to have, but I think we need it in the CDK as well. Trying to replicate some NNS canisters, XRC, a few SNSes, some ledgers of different types, and DEXes and DeFi canisters and populate them with realistic data while all of them constantly change, some are closed source and the whole process can’t be automated - not sure that’s possible.

2 Likes

I agree! And I’m not saying that it shouldn’t be done, but like the above mentioned protocols, significant attention will be needed to make it easier for users to consume the new protocols…good examples…good libraries…all of that. Totally do able…mostly thinking out lout and seeking consensus on the implications.

I may have done a bad job of conveying my point. It wasn’t that ethereum was better or that the IC should copy it, but more that I liked that in ethereum I could write an event that others could consume later without me having to know who they were….and beyond that it was a bummer that your contract couldn’t just ‘subscribe’ to that event(and of course you can’t because too many subscribers would blow out your gas limit).

What I meant to convey is that, perhaps, instead of trying to shove three modes of execution into supporting one paradigm, perhaps two different paradigms would be better.(And at the same time I acknowledge that the current one doesn’t scale so maybe it shouldn’t be one of the two, which complicates things since it is already live).

Mode 1: Async Await

Mode 2: Event Base Pub-Sub - not ‘best effort’, not ‘guaranteed’, but ‘no possible guarantee’ so that developers begin with that simple paradigm in mind.

Postulate: A guaranteed message with unbounded return time converges on the preparation and handling of a publication with guaranteed delivery to a known single subscribing verified smart contract that MUST respond unless it is turned off.

So if you got the above plus one to many pub-sub on the replica it would be cool and likely very powerful. (The issues that I came up with were that you might have to keep a bunch of data laying around, but the event system could off load those and have the come back later if a witness were provided…basically a data availability problem that the IC is super good at solving).

So the problem would change from ‘How do we scale guaranteed message delivery’ to “if you need scale, use pub-sub.”

Obviously, very different paradigms, and I guess event-based programming can be more difficult, but as someone up above implied, it may make you more disciplined at handling the way a multi-actor asynchronous multi-verse “actually” works.

1 Like