Inter-Canister Query Calls (Status: done)

Great example!

For an auth canister, I assume that you also want the response to be secure, right? But query calls are inherently insecure, as they are executed by a single, possibly malicious node. And this problem does not disappear just because the call originates in a canister.

Sometimes query calls can be made more secure using certified variables, but it requires some serious effort from the application developer on both the server and the client side (see the second half of my explainer video on that, but note that the shown slides are somewhat wrong and off. A fixed video is in the making by DFINITY and should be uploaded any time now).

Of course, one could say “give us inter-canister query calls, even if they are less secure” – after all, it’s only consistent with what we get with normal query calls. But then the question is: What should happen when a canister running in the replicated environment, meaning it was invoked via a regular (a.k.a. update) call or a regular inter-canister call, wants to do a query calls? Just doing a normal query call from the replicated execution engine could easily break determinism and thus the stability of the system.

This just as a glimpse why inter-canister calls, done correctly, are harder than it looks like.

7 Likes

Thanks @nomeata for clarifying the confusion and also giving an example why inter-canister query calls (ICQC) are hard.

For many people the feature sounds like a simple extension of what we already have for update calls. The original post in this thread also gives that impression. I think supporting ICQC is a much harder problem (probably by orders of magnitude). There are many roadblocks in specification, performance, memory usage for which we currently don’t have good solutions.

As @diegop mentioned above, the team is busy with higher priority projects. For example, I am working 100% on canister sandboxing and don’t have enough cycles to think through all these issues.

To set the right expectations: please don’t expect much progress here in the near future (several months) unless there are volunteers from the community to drive this forward.

7 Likes

@ulan @diegop any update here on progress?

Our team is also blocked by this

3 Likes

Hi Dan,

The status of the feature is the same as before. There was no progress due to other higher-priority projects.

That’s said, thank you for the input! I’ll bring this up in our next team meeting after the holidays.

Would you mind sharing more details about your use case? One design that we are considering is executing the response callback of an inter-canister query call against the latest state of the canister (which differs from the state when the call was performed). That would resolve the main performance/memory blockers of ICQC but would make the programming model more complicated because the response callback would not see the memory changes done by the original query.

Cheers,
Ulan.

3 Likes

Thank you for the response!

We’re working on making updates to Internet Identity and would like to store registered usernames from all of an anchor’s logins so that a user could select which username to “continue as” when authenticating to dapps.

Among other things, we need to communicate between the target canister and ours, likely across subnets.

Do you anticipate us running into any issues running cross-canister, cross-subnet calls like this in the future?

2 Likes

Perhaps that would be fine if the main use case is ingress queries making inter-canister query calls, since there are no state changes involved.

If ingress updates make inter-canister query calls, that would indeed be more complicated.

Although ingress updates can probably tolerate additional latency from inter-canister updates (i.e. the client already is doing optimistic updates), so maybe that’s OK?

2 Likes

Sorry for the delayed response. I was on vacation.

The latency of a cross-subnet query will be higher compared to the same-subnet query. The main issue I see is that it may take a long time until cross-subnet queries are supported. We have a prototype for same-subnet queries, but there is no clear solution for cross-subnet queries yet. It will likely take several months to come up with the solution and implement it.

I wonder if there is a way to unblock you in the meantime. Did you consider moving the cross-canister communication to the client-side JavaScript in the browser? So that JavaScript calls your canister and another canister and combines the results? Alternatively, can you use update calls instead of query calls. The latency will be higher, but cross-subnet/cross-canister update calls are already supported now.

Yes, I think that should be OK. The called query would be a replicated query and would need to use the latest state anyway.

3 Likes

We have a prototype for same-subnet queries, but there is no clear solution for cross-subnet queries yet.

If I understand correctly and you already have a prototype for same-subnet, inter-canister queries, then when will that prototype be productionized and fully launched on the mainnet?

I thought inter-canister queries were too difficult to implement. I’m surprised you already have a prototype for it.

2 Likes

The prototype works only for same-subnet queries. To make it production ready we need to support cross-subnet queries. Unfortunately, the prototype does not generalize to cross-subnet queries, so we need a completely new approach. In addition to that we need to solve the spec issues raised in this thread and the state explosion problem due to callers holding on to the old states. All this make ICQC a very difficult problem.

2 Likes

One design that we are considering is executing the response callback of an inter-canister query call against the latest state of the canister (which differs from the state when the call was performed). That would resolve the main performance/memory blockers of ICQC but would make the programming model more complicated because the response callback would not see the memory changes done by the original query.

Does this design not work for cross-subnet queries? I’m curious what makes cross-subnet queries that much more challenging than same-subnet queries.

I think having some limited, working version of ICQC―even if not complete―could still be potentially very useful.

2 Likes

Yes, from the implementation and performance point of view that design is the most promising. However, it is more difficult to use for developers. For example, it is not compatible with async Rust API because the response closure is stored on the heap, but the heap changes would not be preserved in the proposed design when the response arrives. This means that only static functions can be used as response callbacks. I am also worried that the developers will miss the subtle point that all memory changes are discarded by the time when the response callback runs, so it will be a major footgun.

2 Likes

Interestingly, if canisters program closer to the actor model, e.g. to be always upgradeable, even with outstanding calls, then such a model becomes a bit more plausible.

2 Likes

Great insight and write up, @nomeata! Indeed, the design looks more reasonable with the actor model.

1 Like

I chatted with Ulan about one-shot messaging a bit more. I really like the idea of using them for update calls but for ICQC they can be tricky. If user queries canister A; A sends a one shot message to B; and is waiting for a response; the system doesn’t know that it is waiting for a response and that it should wait.

I suppose we will need some mechanism for A to tell the system that it may still produce a response in the above case.

4 Likes

For example, it is not compatible with async Rust API because the response closure is stored on the heap, but the heap changes would not be preserved in the proposed design when the response arrives.

Can you explain why this is a problem with cross-subnet queries but not same-subnet queries? I’m not sure I understand why the heap is cleared for cross-subnet but not same-subnet.

I don’t think there will be any difference in the programming model / semantics depending on whether or not you are going cross-subnet or same-subnet. The async rust API difficulties will present in both cases.

2 Likes

Can you explain why this is a problem with cross-subnet queries but not same-subnet queries? I’m not sure I understand why the heap is cleared for cross-subnet but not same-subnet.

What we have currently is a prototype implementation of same-subnet ICQC that keeps all state changes in memory until all called inter-canister queries return. The main blocker for this prototype implementation is theoretically unbounded memory consumption (every call kind of forks the chain of state changes). The problem becomes worse with cross-subnet queries that have much higher latencies.

One way to fix the memory consumption problem is to change the semantics of the calls to discard the state changes. But that makes the programming model difficult to use (async rust API problem). The problem applies to same- and cross-subnet calls equally as @akhilesh.singhania mentioned.

2 Likes

Let’s say you have an ICQC that’s made in the context of a query.

Queries already discard state updates to canisters.

Are you saying that even local variables will be discarded within the context of a query that makes an ICQC?

1 Like

Queries already discard state updates to canisters.

A query discards the state updates after its execution is fully complete. What to do with the state updates when the query has a pending call (i.e. the query performed the call, but its response callback did not run yet) is an open question. Discarding the state changes leads to a confusing programming model.

Are you saying that even local variables will be discarded within the context of a query that makes an ICQC?

I guess you mean the local variables in async code while awaiting for the result of a call. The local variables are stored in memory across the await points, so discarding the state changes would also discard the local variables. Even worse: the implementation of async/await in Rust stores internal information in memory, so discarding the state changes would make awaiting impossible (that’s what I meant by “not compatible with async Rust API” earlier).

3 Likes

Thanks, this makes sense.

When I make an inter-canister update call, the IC runtime is able to save (i.e. fork) the current state of the canister at the time the call is made, and then resume from that state when the call returns.

Is that difficult to apply that same code to inter-canister query calls? You mention unbounded memory consumption as a blocker. But what I don’t quite understand is how the IC’s implementation of inter-canister update calls was able to avoid that problem? It seems to work fine for both same-subnet and cross-subnet updates.

1 Like