Inter-Canister Query Calls (Status: done)

We are working on a proposal to group canisters first. The proposal of ICQC will come after that.

3 Likes

I personally don’t like option 1 and would rather have what @nomeata proposed, one of the selling points of IC’s subnets is they are transparent to devs, if we go with option 1 that would no longer be the case.

2 Likes

Thanks for the input @Zane! Since each option has its own disadvantages, I think the community will settle this by voting.

Did we ever get a chance to vote on an ICQC proposal?

Did we ever get a chance to vote on an ICQC proposal?

Sorry not much progress on ICQC itself. @bogwar is working on the canister groups proposal, which turned out to be much more difficult than we anticipated. Option 1 depends on the outcome of that.

I am currently working on the deterministic time slicing. Other people who could help with ICQC are busy with the BTC integration. Once both projects ship, there will be faster progress on ICQC.

6 Likes

I will give an update in the public Global R&D today: Live Sessions and a more detailed technical presentation in Scalability & Performance WG on October 20: Technical Working Group: Scalability & Performance - #19 by esquivada

The plan is to release ICQC incrementally:

  1. Get the existing prototype into a production ready state with two main limitations: no support for cross-subnet calls and no support for replicated execution.
  2. Work on adding cross-subnet support. The main challenge here as mentioned before is the caller_id problem. We either need to find a solution to ensure its trustworthiness or accept the reduced trustworthiness. Another challenge is to rewrite the prototype to use more complex async/distributed algorithm.
  3. Work on replicated execution. This step may be infeasible. To be on the safe side, we need to introduce a new query type for ICQC that doesn’t allow replicated execution before we release the prototype.

After gathering feedback here and in the Scalability & Performance WG session, I’ll prepare and submit a motion proposal with the ICQC roadmap.

4 Likes

Thanks for the update

I might be missing something, aren’t queries already not replicated? Does it mean we can only use ICQC to serve data to users and not to get data from another canister and then run replicated logic on it?

“(*) A user may choose to run a query in an update context, where queries are executed on all replicas along with update messages. The state changes of queries are discarded, but the results go through consensus. We will refer to queries as replicated and non-replicated depending on whether they run in an update context or not. Replicated queries should not be confused with certified queries that also go through consensus but not in an update context. Note that certified queries currently exist only as an idea and are not implemented.”

2 Likes

I thought they only pass the certificate through along with the data, at what place do they go through consensus? Maybe it’s the “but not in an update context” part that explains that, although I’m not sure what it means.

Your’re talking about certified variables

1 Like

True, I mixed that up.

What’s a certified query though? Is there any writeup on the idea or is it the concept discussed here

Where are you quoting from?

1 Like

Hi folks. In preparation for the NNS proposal I would like to share our current understanding of the problem. The main change since my previous post about the trade-offs is that we no longer think that [async/await] conflicts with [replicated mode] because supporting replicated execution for ICQC seems feasible (even though very difficult).


Background & Concepts

Execution mode

A message on the IC can be executed in two different modes: replicated and non-replicated. The following table summarizes the differences between them.

Replicated execution Non-replicated execution
High-latency Low-latency
Runs on all nodes Runs on a single node
Goes through consensus Doesn’t go through consensus
Result is signed by the subnet Result is signed by the node

Sidenote: there is a third mode that currently exists as only an idea: run non-replicated execution on at least n/3+1 nodes. This mode is known as certified/secure/repeated execution.

Canister method types

Canisters have two types of methods: updates and queries. The following table summarizes the differences between them.

Query Update
Read-only Modifies state
Isolated from other queries Sees changes of other updates
Supports all execution modes Replicated execution only
No calls Cross- and same-subnet calls

The second property - query isolation - is crucial for reasoning about ICQC. It means that state changes made by one query are not visible to the other queries. The property currently holds trivially for queries because they are read-only and discard all state changes after the execution.

Objective & Requirements

Our goal is to allow a query method to call other query methods of the same or other canisters. Let’s refer to the new queries that have this ability to make calls as “ICQC queries” and to the existing queries without calls as “regular queries”.

What makes ICQC queries really challenging to implement is that they combine the most difficult properties of regular queries and updates as shown in the following table:

ICQC query Regular query Update
Modifies state Read-only Modifies state
Isolated from other queries Isolated from other queries Sees changes of other updates
Supports all execution modes Supports all execution modes Replicated execution only
Cross- and same-subnet calls No calls Cross- and same-subnet calls

In the following sections we discuss each of these properties in detail.

State modification

In order to support async/await, an ICQC query has to keep canister state changes until all pending calls return. In other words, an ICQC query behaves like an update while the calls are pending. Once all calls return, then the state changes are discarded and the ICQC query behaves like a query.

Query isolation

Query isolation means that effects of one ICQC query execution are not visible to other ICQC and regular queries. To see why this property is important, consider the following valid ICQC query that calls another query and then destroys the state of the canister.


async fn query_foo(input: Input) -> Output {
  let result = call(bar, "query", data).await;
  destroy_all_global_state(); // No problem for other queries.
  return result;
}

Other queries should work without any problems after this ICQC query finishes.

What follows from the query isolation and state modification properties is that we need to clone or copy the canister state in order to execute an ICQC query. If the ICQC query calls another ICQC query, then we need to clone the other canister state as well.

In general, if we have a call graph where nodes are canisters and edges are queries, then canisters will be cloned as many times as the number of ICQC query edges in the graph:

Replicated execution

Currently the replicated state contains each canister exactly once. Conceptually we can think of it as a mapping from the canister id to canister state: [CanisterId → CanisterState].

In order to support replicated execution of ICQC queries, we need a way to keep multiple versions of the same canister in the replicated state. This means that our mapping becomes something like [CanisterId → CallContextId → CanisterState] where CallContextId corresponds to an ICQC query with pending calls.

To implement this, we would need to change the core components of the IC such as the state manager, message routing, and execution. That would be a large engineering effort (1 - 2 years) with a lot of complexity and unknowns.

Cross-subnet calls

In order to support cross-subnet calls, we would need to rewrite the existing prototype implementation to use an asynchronous/distributed algorithm to traverse the call graph. That is a medium to large size engineering problem.

The question of the caller_id remains open. We either need to find a solution to ensure its trustworthiness or accept the reduced trustworthiness.

Conclusion & Next steps

Our recommendation is to release the existing prototype implementation as a new ICQC query type without the support for cross-subnet calls and replicated execution and then work on those two features separately. The new query type ensures that we don’t break the existing queries that may be used in replicated mode.

A draft of a motion proposal describing this plan will be shared soon.

9 Likes

Why can we even call a queries in replicated mode? Shouldn’t it be up to the canister controller to decide be specifying either a query or “normal” function?

1 Like

Shouldn’t it be up to the canister controller to decide be specifying either a query or “normal” function?

Absolutely! The owner of the canister decides whether a method is a query or an update by annotating it correspondingly in the Wasm source code: queries are exported as canister_query <query_name> and updates as canister_update <update_name>.

That’s said, it is possible and sometimes useful to run a query in replicated mode. The query still keeps its query semantics in the sense that it is still read-only and discards state changes, but the execution happens on all nodes of the subnet. That means that the result of execution goes through consensus and is signed by the subnet key, so the result is more trustworthy compared to non-replicated execution.

Another case when a query runs in replicated mode is when an update method calls a query method.

4 Likes

But can the caller trigger running a query in replicated mode? If so, doesn’t that open the gates for some cycle draining attacks? I know that there are plans to charge for queries in the future, but still I’d assume running a query in replicated mode is more expensive than non replicated mode.

But can the caller trigger running a query in replicated mode?

If the caller is running as an update method, then yes. It has been always possible to call a query from an update. If the query performs some very expensive computation, then there is a potential for a cycle draining attack. One way to protect against it would be to inspect the caller_id and refuse to do the expensive computation. Ideally, queries are fast and do not perform expensive computations.

1 Like

I posted the draft of the motion proposal about the new query type and releasing the prototype implementation here:

Thank you for the excellent write-up!

What follows from the query isolation and state modification properties is that we need to clone or copy the canister state in order to execute an ICQC query. If the ICQC query calls another ICQC query, then we need to clone the other canister state as well.

Once this canister cloning feature is built, does that mean it will unlock the possibility for a broader change in async/await semantics for update calls? For example, update calls currently commit canister state whenever they initiate an inter-canister call (to either a query or update method). That limitation can be quite troublesome for canister developers, as they need to worry about state rollbacks and non-atomic update methods.

If this feature lands, am I correct in assuming that limitation could (technically) be lifted?

I’m looking through the interface spec trying to find a canonical location for this information. Where is this documented?