Pasting the contents of the document here for better readability.
Background
The Internet Computer (IC) has two types of messages: updates and queries. As shown in the table below, queries are fast because they are read-only and don’t have to go through consensus.
|
Update |
Query |
State changes persisted |
|
✘ |
Goes through consensus |
|
✘* |
Low latency, high throughput |
✘ |
|
Inter-canister calls |
|
✘** |
(*) A user may choose to run a query in an update context, where queries are executed on all replicas along with update messages. The state changes of queries are discarded, but the results go through consensus. We will refer to queries as replicated and non-replicated depending on whether they run in an update context or not. Replicated queries should not be confused with certified queries that also go through consensus but not in an update context. Note that certified queries currently exist only as an idea and are not implemented.
An update message can call methods of other canisters even if they are located on different subnets. Such inter-canister calls are essential for composing canisters into scalable applications.
(**) Queries do not support inter-canister calls. There is an incomplete implementation of ICQC that is enabled on verified subnets, but it is not generally available for the reasons explained below.
Requirements
The main requirement for ICQC is consistency with the existing inter-canister calls in update messages. Developers already know how inter-canister calls work and have certain expectations. Specifically:
A. [async/await] A call in a query should have the same semantics as a call in an update message. More concretely, it should be possible to support async/await for query calls.
B. [caller-id] The callee of an inter-canister call should be able to read and trust the identity of the caller.
C. [replicated mode] A call in a query should work both in replicated and non-replicated modes.
D. [cross-subnet] It should be possible to call a canister in another subnet.
In addition to the four consistency requirement above, we want queries to remain fast and to not consume a lot of memory:
E. [fast execution] Queries are faster than update messages because they don’t need to keep track of modified memory pages. Ideally, ICQC does not regress this.
F. [low memory usage] Ideally, ICQC does not introduce additional memory overhead.
Note that requirements E and F are important for stability and availability of IC.
Prototype Implementation
Verified application subnets have an incomplete prototype implementation of ICQC. The prototype was implemented under time pressure before the launch of IC. There was not enough time to think through the design and to write a specification. The prototype satisfies only two requirements: [async/await] and [caller-id]. Other requirements are not satisfied:
- it works only in non-replicated mode,
- it works only for same-subnet calls,
- it is up 2x slower because of the need to keep track of modified pages.
- it has to hold on to the state until all calls complete, so in some cases it may double the memory usage of the replica.
Trade-off Space
The bad news is that the requirements are in conflict with each other. We identified two trade-off pairs:
- [async/await] vs [replicated mode, fast execution, low memory usage].
- [caller-id] vs [cross-subnet].
In each trade-off pair we can choose only one alternative. For example, the prototype implementation corresponds to [async/await] + [caller-id]. It seems that [async/await] is non-negotiable. Sacrificing it would result in a strange, non-intuitive programming model. Given that, the only other viable combination is [async/await] + [cross-subnet], where all inter-canister query calls are effectively anonymous/public.
Explanation of Trade-off 1
Consider the following generic async query function that calls another query:
async fn query_foo(input: Input) -> Output {
let data = pre_process(input);
let result = call(canister_bar, "query_bar", data).await;
post_process(result)
}
It prepares some data, issues a call to another query, awaits the result, and finally returns a processed result. IC executes the functions as two separate messages. The first message runs from the start of the function to the await point. At the await point, the runtime of the language (Rust/Motoko) is going to save all necessary information in the WebAssembly memory (future, task, environment) such that when the reply message of the call comes back, the execution can continue from the await point to the end of the function.
The crucial part here is “save all necessary information in memory”, which means that the state changes are important and can no longer be discarded. Thus, a query becomes similar to an update message until the call completes. Figure 1 visualizes the effect of ICQC on canister states. Normally, the state of a canister evolves linearly, changing only from one round to the next. A query that supports ICQC introduces a branch in the chain of states. This doesn’t work in replicated mode because the replicated state supports only one state per canister. The need to keep track of state changes makes execution slower and increases memory usage.
Figure 1. A query call creates a branch in the linear chain of canister states.
Explanation of Trade-off 2
All messages in IC are signed by the key of the sender, which can be either a user or a subnet. Figure 2 shows the scenario where a user sends a non-replicated query to a single replica. If the query calls another canister on another subnet, then that message cannot not be signed because the replica does not have the key of the subnet (by design). This means that the callee cannot trust the id of the caller and has to treat the incoming query as anonymous or public.
Figure 2. The user sends a non-replicated query to one replica in the first subnet, which in turn sends an inter-canister query to a replica in another subnet.
Conclusion
Based on the trade-off space, we have the following options:
- ICQC in non-replicated mode and only on the same subnet. This corresponds to the existing prototype implementation. If we go with this option, then we would enable the prototype everywhere.
- ICQC in non-replicated mode without caller-id. In this case the callee canister has to assume that the request is anonymous and respond only with publicly visible data, which greatly reduces the utility of ICQC.
- Do not support ICQC. Developers would have to move logic that calls other canisters to the client side.