Hi folks. In preparation for the NNS proposal I would like to share our current understanding of the problem. The main change since my previous post about the trade-offs is that we no longer think that [async/await]
conflicts with [replicated mode]
because supporting replicated execution for ICQC seems feasible (even though very difficult).
Background & Concepts
Execution mode
A message on the IC can be executed in two different modes: replicated and non-replicated. The following table summarizes the differences between them.
Replicated execution |
Non-replicated execution |
High-latency |
Low-latency |
Runs on all nodes |
Runs on a single node |
Goes through consensus |
Doesn’t go through consensus |
Result is signed by the subnet |
Result is signed by the node |
Sidenote: there is a third mode that currently exists as only an idea: run non-replicated execution on at least n/3+1 nodes. This mode is known as certified/secure/repeated execution.
Canister method types
Canisters have two types of methods: updates and queries. The following table summarizes the differences between them.
Query |
Update |
Read-only |
Modifies state |
Isolated from other queries |
Sees changes of other updates |
Supports all execution modes |
Replicated execution only |
No calls |
Cross- and same-subnet calls |
The second property - query isolation - is crucial for reasoning about ICQC. It means that state changes made by one query are not visible to the other queries. The property currently holds trivially for queries because they are read-only and discard all state changes after the execution.
Objective & Requirements
Our goal is to allow a query method to call other query methods of the same or other canisters. Let’s refer to the new queries that have this ability to make calls as “ICQC queries” and to the existing queries without calls as “regular queries”.
What makes ICQC queries really challenging to implement is that they combine the most difficult properties of regular queries and updates as shown in the following table:
ICQC query |
Regular query |
Update |
Modifies state |
Read-only |
Modifies state |
Isolated from other queries |
Isolated from other queries |
Sees changes of other updates |
Supports all execution modes |
Supports all execution modes |
Replicated execution only |
Cross- and same-subnet calls |
No calls |
Cross- and same-subnet calls |
In the following sections we discuss each of these properties in detail.
State modification
In order to support async/await, an ICQC query has to keep canister state changes until all pending calls return. In other words, an ICQC query behaves like an update while the calls are pending. Once all calls return, then the state changes are discarded and the ICQC query behaves like a query.
Query isolation
Query isolation means that effects of one ICQC query execution are not visible to other ICQC and regular queries. To see why this property is important, consider the following valid ICQC query that calls another query and then destroys the state of the canister.
async fn query_foo(input: Input) -> Output {
let result = call(bar, "query", data).await;
destroy_all_global_state(); // No problem for other queries.
return result;
}
Other queries should work without any problems after this ICQC query finishes.
What follows from the query isolation and state modification properties is that we need to clone or copy the canister state in order to execute an ICQC query. If the ICQC query calls another ICQC query, then we need to clone the other canister state as well.
In general, if we have a call graph where nodes are canisters and edges are queries, then canisters will be cloned as many times as the number of ICQC query edges in the graph:
Replicated execution
Currently the replicated state contains each canister exactly once. Conceptually we can think of it as a mapping from the canister id to canister state: [CanisterId → CanisterState]
.
In order to support replicated execution of ICQC queries, we need a way to keep multiple versions of the same canister in the replicated state. This means that our mapping becomes something like [CanisterId → CallContextId → CanisterState]
where CallContextId
corresponds to an ICQC query with pending calls.
To implement this, we would need to change the core components of the IC such as the state manager, message routing, and execution. That would be a large engineering effort (1 - 2 years) with a lot of complexity and unknowns.
Cross-subnet calls
In order to support cross-subnet calls, we would need to rewrite the existing prototype implementation to use an asynchronous/distributed algorithm to traverse the call graph. That is a medium to large size engineering problem.
The question of the caller_id remains open. We either need to find a solution to ensure its trustworthiness or accept the reduced trustworthiness.
Conclusion & Next steps
Our recommendation is to release the existing prototype implementation as a new ICQC query type without the support for cross-subnet calls and replicated execution and then work on those two features separately. The new query type ensures that we don’t break the existing queries that may be used in replicated mode.
A draft of a motion proposal describing this plan will be shared soon.