Inter-Canister Query Calls (Status: done)

There is no concrete design for certified queries, so I am going to speculate here. If we disallow cross-subnet calls, then I don’t see any blockers. I think the existing prototype would work also for certified queries. If we support cross-subnet calls, then we probably need multiple rounds of consensus, which would increase the complexity of certified queries but at the same time would allow signing of the cross-subnet messages (as @nomeata noted in the very first comment of this thread).

Also, I hate to ask this, but assuming option 1 is favored by the community (perhaps in a proposal vote), when do you expect ICQC to be generally available? Would it be in a matter of 1-2 months or much much longer?

In terms of engineering work 1-3 SWE-months sounds like the right ballpark. Please note that this doesn’t directly translate to real months because there are other high-priority projects queued up and the teams are overloaded.

Some of the remaining work:

Since the calls are limited to the same subnet, we would need some mechanism to create canisters on the same subnet and also keep them together in case of subnet splitting. I think this is the main blocker.

We need to consider whether we want to introduce some way of charging for queries because the execution cost of all queries is going to increase significantly. That’s because we don’t know beforehand whether a query is going to call another query or not, so we would have to keep track of modified memory for all queries. (We could also introduce some annotation for developers to indicate that a query is going to use ICQC, but that seems a bit ad-hoc to me).

We need to introduce limits to guarantee that all queries terminate within a reasonable deadline.

1 Like

The cross-subnet query calls would indeed fail if we go with option 1. I don’t see a working fallback here.

For example, what if the developer has the caller canister contract code generate
a key that can be sent to the callee, and then the caller and callee canister can
both store this key and agree upon it/update it as needed.
An improved version of this same idea would be instead of just a single key to use
a public/private key pair that is used, where the callee has the public key and a
transformation function, and the caller sends the private key as a function parameter.

Thanks for sharing the idea! I am assuming you meant that the caller signs the message using the private key and sends the signature instead of sending the private key. I think it would work and seems to have somewhat similar security properties as the scheme proposed by @nomeata (for the case when a malicious replica obtains the private key):

Good point! I haven’t considered it from this angle. I think the answer depends on the security/threat model. In theory we trust the replica to keep the state of X confidential, but we can never exclude bugs in the implementation. In the very worst case the replica exploits such bugs and manages to leak the confidential data. Do we want to limit the damage to the subnet of the malicious replica or allow it to spread to other subnets?

I like having the subnet boundary as an additional security barrier, but may be I am too paranoid here. I wonder if the security folks have an opinion here. @robin-kunzler

4 Likes

… or run the opportunistically in the fast, non-tracking mode, and if they do make calls, abort and re-run in the memory-tracking mode. If the fraction of non-calling queries is low enough (as it likely will), determinism allows all kind of neat optimizations :slight_smile:

It doesn’t really “spread“ in the sense that it is still only data accessible to X that is under threat – data owned by unrelated canister Y is unaffected. The data may be living somewhere else, but conceptually, not a huge big deal – or at least small enough to not sacrifice the programming model for it :slight_smile:

Yep, that’s a neat optimization and I am implemented that a while ago :smiley: ic/rs/execution_environment/src/query_handler/query_context.rs at master · dfinity/ic · GitHub
The tradeoff is that queries using ICQC become even slower, so we might have to disable the optimization if popular canisters start relying on ICQC (not sure how likely it is).

@manu made a good point in our offline discussion yesterday: since a malicious replica can forge any query, it could be the case that the query would be impossible in a valid execution. For example, if the caller performs the access check and the callee trusts caller_id, then the malicious replica can get data that shouldn’t be accessible to X. I know it is a bad idea to have access checks on the caller side, but this is just to highlight that the issue is subtle and trusting caller_id may be a security footgun.

Hmm, good example. Although it’s still within “only data accessible to canister X is at risk ”, even if an uncompromised canister wouldn’t usually query that data.

And note that the leak you describe is comparable in impact (actually less) to the callee’s subnet having a single compromised node.

I’m still not convinced that this is worth breaking (or at least worsening) the programming model over a small strengthening of data privacy guarantees in a corner case. It’s odd if a normal call has the correct sender, but a call to a query function doesn’t.

3 Likes

We need to consider whether we want to introduce some way of charging for queries because the execution cost of all queries is going to increase significantly. That’s because we don’t know beforehand whether a query is going to call another query or not, so we would have to keep track of modified memory for all queries. (We could also introduce some annotation for developers to indicate that a query is going to use ICQC, but that seems a bit ad-hoc to me).

My personal opinion is that having an annotation wouldn’t be too bad in terms of developer experience if it can save them some cycles.

I think charging for queries may change the decision calculus for voters. Do you plan on submitting a proposal for Option 1 before implementation? I think it’d be nice to get community feedback on whether this tradeoff is worth it. (A proposal with a timer also makes it more likely for people to read and comment.)

1 Like

Do you plan on submitting a proposal for Option 1 before implementation?

Absolutely! We are currently looking into the first blocker: a way to group canisters. We will propose that as a separate feature because it is useful for subnet splitting.

1 Like

Because of this limitation, I’m pretty firmly against Option 1. If an app on the IC is massively successful or runs into a crowded subnet, it makes for a difficult breaking point in the future, where an successful app now needs to plan huge data migrations on the IC - that sounds rough!

3 Likes

this could be retired if the IC adopted a modern, capability-based access model

What’s blocking you from achieving that? Technological constraints or ideological ones?

1 Like

@ulan Do you plan on submitting a proposal for this, or is this still lower priority than other projects?

1 Like

We are working on a proposal to group canisters first. The proposal of ICQC will come after that.

3 Likes

I personally don’t like option 1 and would rather have what @nomeata proposed, one of the selling points of IC’s subnets is they are transparent to devs, if we go with option 1 that would no longer be the case.

2 Likes

Thanks for the input @Zane! Since each option has its own disadvantages, I think the community will settle this by voting.

Did we ever get a chance to vote on an ICQC proposal?

Did we ever get a chance to vote on an ICQC proposal?

Sorry not much progress on ICQC itself. @bogwar is working on the canister groups proposal, which turned out to be much more difficult than we anticipated. Option 1 depends on the outcome of that.

I am currently working on the deterministic time slicing. Other people who could help with ICQC are busy with the BTC integration. Once both projects ship, there will be faster progress on ICQC.

6 Likes

I will give an update in the public Global R&D today: Live Sessions and a more detailed technical presentation in Scalability & Performance WG on October 20: Technical Working Group: Scalability & Performance - #19 by esquivada

The plan is to release ICQC incrementally:

  1. Get the existing prototype into a production ready state with two main limitations: no support for cross-subnet calls and no support for replicated execution.
  2. Work on adding cross-subnet support. The main challenge here as mentioned before is the caller_id problem. We either need to find a solution to ensure its trustworthiness or accept the reduced trustworthiness. Another challenge is to rewrite the prototype to use more complex async/distributed algorithm.
  3. Work on replicated execution. This step may be infeasible. To be on the safe side, we need to introduce a new query type for ICQC that doesn’t allow replicated execution before we release the prototype.

After gathering feedback here and in the Scalability & Performance WG session, I’ll prepare and submit a motion proposal with the ICQC roadmap.

4 Likes

Thanks for the update

I might be missing something, aren’t queries already not replicated? Does it mean we can only use ICQC to serve data to users and not to get data from another canister and then run replicated logic on it?

“(*) A user may choose to run a query in an update context, where queries are executed on all replicas along with update messages. The state changes of queries are discarded, but the results go through consensus. We will refer to queries as replicated and non-replicated depending on whether they run in an update context or not. Replicated queries should not be confused with certified queries that also go through consensus but not in an update context. Note that certified queries currently exist only as an idea and are not implemented.”

2 Likes

I thought they only pass the certificate through along with the data, at what place do they go through consensus? Maybe it’s the “but not in an update context” part that explains that, although I’m not sure what it means.