Inter-Canister Query Calls (Status: done)

Your’re talking about certified variables

1 Like

True, I mixed that up.

What’s a certified query though? Is there any writeup on the idea or is it the concept discussed here

Where are you quoting from?

1 Like

Hi folks. In preparation for the NNS proposal I would like to share our current understanding of the problem. The main change since my previous post about the trade-offs is that we no longer think that [async/await] conflicts with [replicated mode] because supporting replicated execution for ICQC seems feasible (even though very difficult).


Background & Concepts

Execution mode

A message on the IC can be executed in two different modes: replicated and non-replicated. The following table summarizes the differences between them.

Replicated execution Non-replicated execution
High-latency Low-latency
Runs on all nodes Runs on a single node
Goes through consensus Doesn’t go through consensus
Result is signed by the subnet Result is signed by the node

Sidenote: there is a third mode that currently exists as only an idea: run non-replicated execution on at least n/3+1 nodes. This mode is known as certified/secure/repeated execution.

Canister method types

Canisters have two types of methods: updates and queries. The following table summarizes the differences between them.

Query Update
Read-only Modifies state
Isolated from other queries Sees changes of other updates
Supports all execution modes Replicated execution only
No calls Cross- and same-subnet calls

The second property - query isolation - is crucial for reasoning about ICQC. It means that state changes made by one query are not visible to the other queries. The property currently holds trivially for queries because they are read-only and discard all state changes after the execution.

Objective & Requirements

Our goal is to allow a query method to call other query methods of the same or other canisters. Let’s refer to the new queries that have this ability to make calls as “ICQC queries” and to the existing queries without calls as “regular queries”.

What makes ICQC queries really challenging to implement is that they combine the most difficult properties of regular queries and updates as shown in the following table:

ICQC query Regular query Update
Modifies state Read-only Modifies state
Isolated from other queries Isolated from other queries Sees changes of other updates
Supports all execution modes Supports all execution modes Replicated execution only
Cross- and same-subnet calls No calls Cross- and same-subnet calls

In the following sections we discuss each of these properties in detail.

State modification

In order to support async/await, an ICQC query has to keep canister state changes until all pending calls return. In other words, an ICQC query behaves like an update while the calls are pending. Once all calls return, then the state changes are discarded and the ICQC query behaves like a query.

Query isolation

Query isolation means that effects of one ICQC query execution are not visible to other ICQC and regular queries. To see why this property is important, consider the following valid ICQC query that calls another query and then destroys the state of the canister.


async fn query_foo(input: Input) -> Output {
  let result = call(bar, "query", data).await;
  destroy_all_global_state(); // No problem for other queries.
  return result;
}

Other queries should work without any problems after this ICQC query finishes.

What follows from the query isolation and state modification properties is that we need to clone or copy the canister state in order to execute an ICQC query. If the ICQC query calls another ICQC query, then we need to clone the other canister state as well.

In general, if we have a call graph where nodes are canisters and edges are queries, then canisters will be cloned as many times as the number of ICQC query edges in the graph:

Replicated execution

Currently the replicated state contains each canister exactly once. Conceptually we can think of it as a mapping from the canister id to canister state: [CanisterId → CanisterState].

In order to support replicated execution of ICQC queries, we need a way to keep multiple versions of the same canister in the replicated state. This means that our mapping becomes something like [CanisterId → CallContextId → CanisterState] where CallContextId corresponds to an ICQC query with pending calls.

To implement this, we would need to change the core components of the IC such as the state manager, message routing, and execution. That would be a large engineering effort (1 - 2 years) with a lot of complexity and unknowns.

Cross-subnet calls

In order to support cross-subnet calls, we would need to rewrite the existing prototype implementation to use an asynchronous/distributed algorithm to traverse the call graph. That is a medium to large size engineering problem.

The question of the caller_id remains open. We either need to find a solution to ensure its trustworthiness or accept the reduced trustworthiness.

Conclusion & Next steps

Our recommendation is to release the existing prototype implementation as a new ICQC query type without the support for cross-subnet calls and replicated execution and then work on those two features separately. The new query type ensures that we don’t break the existing queries that may be used in replicated mode.

A draft of a motion proposal describing this plan will be shared soon.

9 Likes

Why can we even call a queries in replicated mode? Shouldn’t it be up to the canister controller to decide be specifying either a query or “normal” function?

1 Like

Shouldn’t it be up to the canister controller to decide be specifying either a query or “normal” function?

Absolutely! The owner of the canister decides whether a method is a query or an update by annotating it correspondingly in the Wasm source code: queries are exported as canister_query <query_name> and updates as canister_update <update_name>.

That’s said, it is possible and sometimes useful to run a query in replicated mode. The query still keeps its query semantics in the sense that it is still read-only and discards state changes, but the execution happens on all nodes of the subnet. That means that the result of execution goes through consensus and is signed by the subnet key, so the result is more trustworthy compared to non-replicated execution.

Another case when a query runs in replicated mode is when an update method calls a query method.

4 Likes

But can the caller trigger running a query in replicated mode? If so, doesn’t that open the gates for some cycle draining attacks? I know that there are plans to charge for queries in the future, but still I’d assume running a query in replicated mode is more expensive than non replicated mode.

But can the caller trigger running a query in replicated mode?

If the caller is running as an update method, then yes. It has been always possible to call a query from an update. If the query performs some very expensive computation, then there is a potential for a cycle draining attack. One way to protect against it would be to inspect the caller_id and refuse to do the expensive computation. Ideally, queries are fast and do not perform expensive computations.

1 Like

I posted the draft of the motion proposal about the new query type and releasing the prototype implementation here:

Thank you for the excellent write-up!

What follows from the query isolation and state modification properties is that we need to clone or copy the canister state in order to execute an ICQC query. If the ICQC query calls another ICQC query, then we need to clone the other canister state as well.

Once this canister cloning feature is built, does that mean it will unlock the possibility for a broader change in async/await semantics for update calls? For example, update calls currently commit canister state whenever they initiate an inter-canister call (to either a query or update method). That limitation can be quite troublesome for canister developers, as they need to worry about state rollbacks and non-atomic update methods.

If this feature lands, am I correct in assuming that limitation could (technically) be lifted?

I’m looking through the interface spec trying to find a canonical location for this information. Where is this documented?

Also how do you call a query in replicated execution mode? Do you just submit a call HTTPS API request to a query method?

I actually don’t know of a canonical location of this information. It might be a good idea to incorporate it into the interface spec. Or maybe in this page which already describes the differences between update and query calls and we can also include this table comparison.

Also how do you call a query in replicated execution mode? Do you just submit a call HTTPS API request to a query method?

Correct. The other way is if you make an update call to a canister which then in turn calls another canister’s query method – in that case the query is also executed in replicated mode.

2 Likes

It’s documented here in the interface spec under Request: Call and Request: Query call. Do you have suggestions how to improve these sections?

Also how do you call a query in replicated execution mode? Do you just submit a call HTTPS API request to a query method?

Correct, that’s all you need to do.

2 Likes

I tried searching through the interface spec for replicated mode, execution modes, or other language that @ulan had shared with his tables and I didn’t find anything useful. I think a section dedicated to these modes and discussing how to invoke them would be helpful.

5 Likes

this topic is “consideration” status, Internet Computer Loading here is done

are they same?

what’s the real condition now?

@bytesun, thanks for raising this point. The feature is done. More info is available here: Composite Queries: Horizontal Scaling for Multi Canister Dapps | by DFINITY | The Internet Computer Review | Medium

@diegop: is it possible to mark this forum thread as “done” somehow?

5 Likes

thanks @ulan for the quick response. I feel it’s different with what I want. Please correct me if I am wrong.

To use this new feature, developer need create a new structure (frontend → backends), and it’s only for partitioned storage scenario, right?

In my case, I have NFT metadata in one canister(A), and assets in another canister(B), I would like one query to get all information(metadata and assets), which means query call A → query call B, it doesn’t look like can be done by this implement, right?

I marked it as solved as well

I will let @ulan answer