The biggest problem with the IC - InterCanister Calls

The biggest problem on the IC seems to be that inter-canister calls render upgrades impossible in a large number of cases. The way tokens and even the ledger and other canisters have been set-up means that you are expected to use inter-canister calls, for instance calling into various NFTs or token standards on the IC like EXT IS20 , the ledger etc.

The fact that any malicious actor can replace the code of the downstream canister with code that prevents your canister from being upgraded safely is a major impediment to the safety and growth on the IC.

I heard that the Foundation had to change the code on each replica/node because the ledger canister was stopped from being upgraded. This same problem if it persists is not acceptable for developers on the IC.

We should urgently as a matter of course address this issue. We need safe inter canister calls and upgrades.

I think one solution to the problem, and I wonder why the foundation has not looked into this, is rejecting messages that come in too late - i.e. much later than after a call was made. Most canisters don’t need to wait more than 1h to a response to some message. I think the alternatives mentioned in the post - changing the programming model to the actor model essentially would be difficult now that the IC has become used to async and await.

Thank you @nomeata for the enlightening post.

References:

12 Likes

+1. This is actually a big concern to me as well (and should be for most IC developers).

If your dapp canister makes inter-canister calls to a token canister (as this official DEX sample code does), then you are at the mercy of the token canister. If the token canister is somehow malicious, then you may be unable to upgrade your dapp canister.

Even if the token canister isn’t malicious, I worry that infrastructure-level issues may prevent a subnet from sending messages to other subnets. This happened last November on subnet pjljw. If the token canister happens to be running on that subnet, then your dapp canister is, once again, non-upgradeable.

Has any progress been made since last November on this front? @JensGroth I wonder if you may know.

7 Likes

Why does another canister being malicious make you unable to upgrade your own canister?

To upgrade a canister you need to stop it.

To stop a canister it needs to have processed all it’s outstanding messages. If you make an inter-canister call and use the await pattern you are a the mercy of the other canister that could block the answer for as long as he wants, your canister would still have some suspended computation (due to the callback context) making it not possible to stop.

https://www.joachim-breitner.de/blog/789-Zero-downtime_upgrades_of_Internet_Computer_canisters

This issue is also related to how the Actor model is implemented at the language level.

5 Likes

Moreover maybe await calls need to be configurable for timeouts, retryAttempts, retryDelay, failoverMethods/Attemps ect…

4 Likes

This is fairly solvable with the right architecture. The pattern is already used by the agent where your update is sent from the client and you get back the request if. The agent then polls the status of the request until a return value is ready.

This needs to be meta-abstracted in to how you call functions intercanister. Send a fire and forget call to a canister with a session Id that knows what call back to use. When that canister is done, it will call back and deliver the payload.
I’m the mean time you can call a query that returns the status. I’ve been thinking a lot about this pattern lately and I’ll try to write it up soon.

It breaks the programming pattern a bit as You are not be able to use await and have all your logic in one spot.

2 Likes

not ideal imo, as we’d want canisters to directly call each other. Especially if Dfinity is aiming for world computer…
Don’t have half the capability of Ethereum if a user has to intermediate between canisters for an update
Also canister heartbeat uses a lot of gas, and self-polling calls are discouraged as potentially unsafe, so this would not be ideal for most canisters.
…

Also you cannot safely currently use query calls in an update method, so I reckon you would still need to make an update call to get cross-canister state.

1 Like

Yes, it is built into the communication model that an inter-canister call is guaranteed to give an answer and indeed this gives a risk you get stuck waiting for a non-responding canister. For this reason the [best practice guidance for secure canister development] (Rust Canister Development Security Best Practices :: Internet Computer) recommends to only talk to trustworthy canisters.
This guidance leaves open the question of how to determine a canister is (and remains) trustworthy and excludes communication with canisters you have not verified. For the former, we have a feature on Verifiable Canisters that may help increase trustworthiness of canisters, and for the second we have a feature proposal for One-shot Messaging. We have started some of the work on Verifiable Canister but the One-shot Messaging feature has not kicked off.
There is also a feature proposal Ensure Canisters Can Always be Upgraded, which targets the specific problem of getting stuck in upgrading. Wonder if this latter would solve your problem, or whether you’d rather see us change the communication model and enable one-shot messaging?
The title of the thread is “the biggest problem with the IC”, it would also be interesting to me whether others share that sentiment, because there is always a priority competition wrt what DFINITY works on.

3 Likes

I would be very very curious to read more about these those feature proposals:

  1. Verifiable Canisters
  2. One-shot Messaging
  3. Ensure Canisters Can Always be Upgraded

If you can share more details, that would be awesome! FWIW, I thought one-shot messaging could be accomplished on the application layer using ignore await foo() in Motoko, for example.

The title of the thread is “the biggest problem with the IC”, it would also be interesting to me whether others share that sentiment, because there is always a priority competition wrt what DFINITY works on.

It might be worth submitting a proposal to ask the community what they think are the biggest problems. But in my experience, the uncertainty and “gotchas” around inter-canister calls (most notably voiced in @nomeata’s article) definitely puts it among the top concerning issues for me.

The only thing that I find even more problematic than inter-canister calls when it comes to IC development are the current limitations with canister storage and stable memory.

7 Likes
  1. Verifiable canisters
    The concrete work currently being done relates to reproducibility of canisters, i.e., a developer wants to make it verifiable that a canister has been compiled from a particular source code, where we want to update the instructions to remedy some cases where they fail and make reproducibility more robust over time.
    We are also working on how to more conveniently display information about canisters on the IC (e.g. controllers, perhaps ownership history, etc.).
    Finally, there is an exploratory dimension, we want to think more broadly about what guarantees the IC can provide about canisters. Example: perhaps one could have an option to specify a call must only be executed if a particular wasm instance is running on the canister (to prevent execution against a changed smart contract). For the latter part I may reach out on the forum to solicit some brainstorming.
  2. One-shot messaging.
    The motivation for this is the same as this thread, non-responses is a risk and could happen due to attack or faults, guaranteed responses also create a risk of cycles in the call graph, and by introducing one-shot messaging you can eliminate that risk. This feature is not currently being worked on though and I’m guessing it will be quite a lot of work, hence my question of how big a concern it is. The problem is that it happens at the platform layer, so even if your canister ignores the response, under the hood the part of the IC that manages your canister will still expect a response (and e.g. get unspent cycles back).
  3. Ensure canisters can always be upgraded.
    The internal description is short, it essentially says governance canisters should be upgradeable and application canisters should be upgradeable. The feature is not active right now, but for the first goal I know some of my colleagues are working on improving upgradeability of the ledger.
2 Likes

Hi, I would say a prioritisation of one-sot messaging over verifiable canisters will be preferable. The problem with verifiable.

Hi Jens, thank you for taking an interest in this topic.

  1. I like the idea for verifiable canisters, and think this would work well for stand-alone canisters but less well for token canisters, or any class of canisters where the requirement is just to implement a certain api; often users want to modify or extend a token, making it pretty infeasible for canisters to track and validate all the WASM hashes. In essence, you would be relying on manual checks to validate every new token canister.
    An alternate idea wrt. verifiability could be to have some automated checks on the cycles usage a method would take, for instance if a method call had no loops and was below a certain size in bytes and made no further external api calls, then you could bound perhaps bound the compute time and cycles use for certain methods. I think this would be preferable for a great many kinds of canisters, and could be coupled with WASM verifiability which may be more useful for users.

  2. Nomeata had some suggestions that could introduce one-shot messaging as a sort of hack (all such messages result in a trap) and also has a Motoko implementation I believe. However, I think there are more ergonomic ways that this could be done, and he has also some comments about how to change the IC spec to allow for this, which would be well received.

I think the issue is, we are an open computing platform, and if the IC decides to change the communication model between canisters later rather than sooner, then this could (a) prevent developers from writing safe code early on and (b) cause a lot of pain upgrading to the new programming model later down the line. In short, it results in a lot of technical debt for the IC ecosystem if not addressed sooner rather than later.

6 Likes

I agree. One-shot messages are very simple, conceptually, and can be implemented easily I assume. The other issue is much more open to interpretation and it’s not immediately obvious how to best solve it. In the meantime, one-shot messages can also be used to emulate abortable calls etc. using a library on the canister side.

1 Like

I am currently pursuing a pattern of inner canister calls among trusted sources and messaging with a verifiable receipt for untrustworthy canisters. I have some folks recruited and we’re going to build it for supernova. We’ll see how it turns out. If it goes right it will also enable @nomeata 's always upgradable pattern.

4 Likes

hey…relax it’s the weekend…take it easy…what ever fuck up ( or maybe not) you think you made is already done. Just take your Girlfriend or Boyfriend out for a drink, get a massage or something.( if you are in Hawaii…Surfing helps :slight_smile: … stop watching CMC. you are not going to influence anybody in this forum. Smart people hang around here!!

  1. One-shot messaging.
    The motivation for this is the same as this thread, non-responses is a risk and could happen due to attack or faults, guaranteed responses also create a risk of cycles in the call graph, and by introducing one-shot messaging you can eliminate that risk. This feature is not currently being worked on though and I’m guessing it will be quite a lot of work, hence my question of how big a concern it is. The problem is that it happens at the platform layer, so even if your canister ignores the response, under the hood the part of the IC that manages your canister will still expect a response (and e.g. get unspent cycles back).

I’m somewhat surprised to hear that this would take a lot of work. Is it primarily a Message Routing Layer change, an Execution Layer change, or both?

  1. Ensure canisters can always be upgraded.
    The internal description is short, it essentially says governance canisters should be upgradeable and application canisters should be upgradeable. The feature is not active right now, but for the first goal I know some of my colleagues are working on improving upgradeability of the ledger.

I’d be very interested in learning about how your colleagues plan on improving the ledger canister’s upgradeability.

If this feature can land, then I think one-shot messaging won’t be as urgently needed. If this feature doesn’t come with many “gotchas”, then I think it may be the most important out of the 3 features you shared.

1 Like

I don’t think “it” will be a lot of work, at least if we talk about the same “if”: When a canister performs a one-shot call, it says it does not want to have to deal with any response (because it can cause issues with upgrading). So all that the system has to do is to remember that fact, don’t consider that callback a blocker for stopping that canister, and when the system-level response comes back, don’t invoke the canister. It can still do all the other things, like reimburse left-over cycles as usual, for example. So it seems it only affects the execution environment in a local way, and the “the part of the IC that manages your canister will still expect a response” will still get one as usual.

Maybe there are ways the system could avoid even the internal response for one-shot messages, but that’d be an optional optimization, and not a requirement for this feature.

2 Likes

Wouldn’t the issue be partialy solved if we could set custom timeouts to inter canister calls?

2 Likes

I don’t see how a timeout could work. Let’s say the timeout is 5 seconds, should the callee canister trap in the middle of its function if it’s not done?
Let’s say the callee makes another call to callee2, and the original caller’s timeout finishes in the middle of callee2’s function, does the state roll back of callee2 and of callee even though callee already committed its state?

Timeout can “simply” be implemented to not relay the response to your own canister. The crux of the problem with updates while having pending requests is that the message response can call into your canister at a random address, where it was expecting the callback handle, but now another data resides because you updated the canister. That’s what can potentially lead to memory problems or undefined behavior.

With ic0.canister_generation and ic0.call_env_append and if there is named callbacks you will be able to reject any callbacks that come in after a certain amount of time and callbacks that come in after an upgrade.

1 Like