Unknown Token Canister Query Strategies

I have a use case where I want to support any random ICRC-2 token for a service that is not easily upgraded. I would really like to cut down on the governance mechanisms for this service so the idea was to allow any canister to be provided as the token canister, and if we had not seen it before we would query the ICRC1/10 standards and icrc1 metadata endpoints for info about the token.

My concern is an attack where the target canister never returns a value and renders the canister un-upgradeable.

I’m trying to come up with some strategies for this and would love some help thinking through it.

Idea 1: Create a wasm that I know is good and allow my service to query any canister with that wasm that can go get the data for me and push it back to me when successful. That way I only call known good canisters and I can trust them. (Similar to how @icme uses the black hole to query cycles securely. This is kind of a pain and involves deploying a utility canister that could run out of cycles. I also don’t know what happens if this canister calls a bunch of calls to a bad-actor canister that never returns…will the build-up until the canister is bricked? If that happens I can deploy another with a know-good wasm module and start over but that seems pretty hacky.

Idea 2: I can query some SNS that already does this like ICPSwap or OpenChat that are managing these canisters through governance. This makes me dependent on their service and there is nothing to keep an attacker from getting through their governance. If they have a different standard that I do and let any memecoin built in whatever language with whatever customization we could increase the attack surface.

Idea 3: Wait for the updates to the Scalable message model and have the utility canister serve with this a time out but call an old style query on the token canister (is this possible? or will they be limited to only calling other scalable messages?). cc @free and @derlerd-dfinity1 (Scalable Messaging Model - #52 by free)

Idea 4: Hack NNS governance by querying a motion proposal to look for flags in the description that would allow my services to add tokens that way. (Which begs an Idea 5 which is to standardize this as a utility itself…kind of a ‘known good’ token registry).

Any thoughts/ideas would be welcome!

1 Like

Maybe I’m missing the problem with the Scalable message model, but what’s wrong with curating your own list in the utility canister? Allow anyone to add a token canister_id manually, and then have autonomous checks that remove from the list if the metadata endpoints return something undesirable. Then have the hard to upgrade canister get the updated list on a periodic timer.

1 Like

I would create a canister that forwards calls to any canister. Besides the target canister and arguments, also specify a (random?) id and callback canister to be called once a response has been received.

This basically proxies the async call/response model into an async call/callback model.

As far as I understand, if a call is never returned it would only prevent upgrades*. I don’t remember reading that it would block other incoming calls.

* I remember reading somewhere you can still upgrade if you manually stop the canister first. If anyone remembers more details regarding this, let me know.

Edit: if you uninstall the code from the canister, calls are dropped and you can install new code again.

Edit2: This approach seems basically the utility canister approach mentioned in the post. Scalable message model would, indeed resolve this issue by having calls timeout.

1 Like

Here in is the problem…those autonomous checks to a canister that you don’t control gives that canister the ability to block your upgrade. See this post: Can you upgrade a canister without stopping it? Yes, eh, no, eh … maybe

My assumption has autonomous checks once a week or something, and the canister with the list under your control. Only upgrade the later when you know there’s no checks occurring. But I guess that’s a special use case if you need both uncontrolled and with no voting.

Best effort messages are getting there (I have most of the code on a development branch; the big core change is under review; but will likely require a couple of months of testing and benchmarking before it can be deployed to mainnet). And, unless the callee explicitly checks whether the call it got is a best-effort call and actively refuses to handle it, you will be able to call any upgrade or query method with a best-effort call.

Until then, you can use your option (1). Or you can make it synchronous by having the utility canister call the intended destination, but in the same call context also loop (e.g. by repeatedly calling raw_rand()) until some deadline has expired; if it got the downstream reply, it can relay it upstream; else, if the deadline has expired, it can produce a timeout response. It could get rather expensive though, if you use this a lot. Regardless, when your utility canister reaches some number of hung outgoing calls, you would uninstall it and switch over to another utility canister.

Hmmm…I think this works but I’m not sure if I can do this from motoko. @claudio is there a way to check a future to see if has been fulfilled without actually awaiting it? I guess I can use a timer before the await and populate a variable elsewhere. If it does work and since almost any ICRC2 canister should return those functions very quickly I shouldn’t have to loop too many times. rawrand() forces a round wait doesn’t it? In other situations, I’ve called a known xnet call(like checking a balance on ICP) for the same effect. I’ll have to inspect which is cheaper.

Just so I’m clear…My property checker canister would be timing out the call context so my Main canister would not be in danger…but what about the property checker? If it has made a call to a bad actor in one of these requests and then the bad canister call goes out of context…does the canister and replica drop it thus removing the hanging reference that would block upgrade of the checker? And what about the queue? Is it an attack vector to call one of those a bunch of times such that the queue gets full? If I’m timing out and the canister stops taking requests then I guess they’d eventually clear but it still seems it could get ddosed. If this canister only listens to my other main canister then I guess I can rate limit there.

The next question would be a cross-over to my Using NnsCanisterUpgrade Proposal types question which would be if the NNS can uninstall a canister through proposal. I think if I track the wasm and maybe query the upgrade history from inside the canister I can give myself security even if I let the controller be the anon identity, but it might ‘feel’ better to others if on the NNS could do it.

This is great to hear…I was concerned about service fragmentation and having to redo a bunch of ICRCs once we had a new request type.

It does, indeed.

I’m not sure I understand the question. But if your proxy canister calls out to a malicious canister that never produces a response, it would have a call context and callback that are never closed and would prevent an upgrade. But you can always uninstall the canister.

It may be. But a queue is between a pair of canisters. So someone else calling your proxy canister would not affect the queue between your main canister and the proxy canister. As for throughput and avoiding blocking, you can always have multiple proxy canisters and loadbalance across them. And when one of them has more than say 100 hanging calls, wait for them to time out, uninstall it and replace it with another proxy canister.

It is all very involved as a one-off setup, though. So unless you need something working ASAP, I would rather wait for best-effort calls. Then you can simply specify a deadline for your call and ensure you won’t be waiting forever. And maybe retry a couple of times after a while, in case the canister / subnet was temporarily unavailable.

2 Likes

Very helpful. Thanks!