Is there a way to give precedence to certain function executions if there is an active queue in the canister's / subnet's execution?

blabagastered · January 31, 2023, 7:36pm

I do.

Noted. I’ll just crank allocation up and reduce other function’s execution demands to a minimum. Only time will tell how far that takes us in terms of DoS attack prevention and time-critical execution guarantees.

The natural question then is whether it’s possible to obtain from within the canister the maximum allocation available (seeing that it presumably can’t always be set to 100 if eg 2 other canisters are already set to 100 each on the same subnet) before attempting setting the value, and if not much or any allocation is available, as might be the case in moments of large market movements on a defi subnet where many canisters attempt at once to maximise their allocation, how might one prevent protocols and the funds in their custody from collapsing, especially when the “denial-of-increased-allocation attack” is executed together with a standard DoS attack that spams the canister’s queue.

More broadly, how do we prevent the attacker from doing exactly that at the time when he needs the time-critical execution to be delayed:

The (for illustration) simplified steps for such an attack might be:

deploy 2 canisters on the subnet
increase their allocation to 100% each on some cost-effective basis for the attack’s duration
DoS attack the defi canister to overwhelm its small allocation execution capacity

One advantage for the attacker is that the defi canister has no idea when the attack might start, and therefore has no way (other than keeping it at a prohibitively high cost at all times, itself increased for the larger subnets that defi requires) of attempting securing a large allocation until after the attack has started, when it is by definition no longer possible to increase it.

ggreif · January 31, 2023, 9:16pm

There were a few minor breaking changes, but even if your code won’t compile, after fixing and compiling it should not cause problems. Feel free to upgrade.

dsarlis · February 1, 2023, 9:03am

More broadly, how do we prevent the attacker from doing exactly that at the time when he needs the time-critical execution to be delayed:
…

In this example, I would point out that 2 canisters might have taken 2 cores of the subnet’s available capacity but there are 2 more cores that can be used by the remaining canisters. In order for the attack to be effective, you would actually need to ensure that more canisters have “work to do” on top of yours to really slow it down. Sending your canister more and more requests doesn’t necessarily prevent it from running – if it’s the only canister that has work to do, it’ll keep getting scheduled even if it has a “best-effort” allocation (and your measures to ensure you process the timer sensitive tasks should help you complete them on time).

So, in short, the DoS attack scenario you’re afraid of would be very expensive for someone to pull off as it would need to involve more canisters to coordinate than just 2 canisters that max out their allocation. Of course, you can still argue that depending on the stakes (i.e. how popular and thus valuable your defi app has become), it might be something that’s still tempting for someone with enough funds behind them but I don’t see that as a very likely scenario.

Finally, we are planning to eventually increase the number of cores available for execution of messages, probably to something like 10 cores or so. No specific ETA for this that I can offer you but it should also help bring down the likelihood of attacks like the one you described as it’s gonna be even more expensive for someone to trigger all the right conditions to effectively block your canister from running.

blabagastered · February 1, 2023, 10:01am

Wouldn’t two 100s reach the capped allocation of 50% per subnet, hence not leaving any room for further allocation increases?

not “more canisters”, but of course you’d make them busy programatically by making them call themselves or by aggressively calling them from the outside to compute heavy nonsense or otherwise delay execution, and you’d spam other (they can be already-existing, owned by others) canisters on the subnet to use up the other, “unallocated” 50% to prevent execution of the canister under attack.

Any app with a TVL of more than a billion is somewhat attractive, and of course the IC aims for far more than that in its core defi apps. I certainly wouldn’t consider working on one with an expected TVL lower than a billion. Indeed, as you correctly implicitly detect, one worries about this sort of thing if and only if one’s serious about it.

It might be worth doing some sort of structured cost-benefit analysis for 1 million, 10 million, 100 million, 1 billion, 10 billion TVL, to know whether and under which conditions a protocol on the IC is capable of securely hosting those numbers, or otherwise get an idea of the max TVL it’s currently and or foreseeably capable of securely managing, or alternatively perhaps reduce the complexity of the system to a point where such an analysis turns unnecessary.

dsarlis · February 1, 2023, 12:42pm

Yes, but as you also said

of course you’d make them busy programatically by making them call themselves or by aggressively calling them from the outside to compute heavy nonsense or otherwise delay execution, and you’d spam other (they can be already-existing, owned by others) canisters on the subnet to use up the other, “unallocated” 50% to prevent execution of the canister under attack

So, what I meant is that it’s not enough to only have the 2 canisters claiming the allowed allocation for the subnet, but you would also need more canisters to exist and keep them busy. Hence, why I said that the attack would be much more expensive to perform.

Any app with a TVL of more than a billion is somewhat attractive, and of course the IC aims for far more than that in its core defi apps. I certainly wouldn’t consider working on one with an expected TVL lower than a billion. Indeed, as you correctly implicitly detect, one worries about this sort of thing if and only if one’s serious about it.
It might be worth doing some sort of structured cost-benefit analysis for 1 million, 10 million, 100 million, 1 billion, 10 billion TVL, to know whether and under which conditions a protocol on the IC is capable of securely hosting those numbers, or otherwise get an idea of the max TVL it’s currently and or foreseeably capable of securely managing, or alternatively perhaps reduce the complexity of the system to a point where such an analysis turns unnecessary.

Very good points. I think doing the analysis you said would certainly be interesting. Let me see if I can get something going on on that front.

blabagastered · February 1, 2023, 1:01pm

and I’d argue strictly necessary for the existence of responsibly launched “serious” defi. In part because it could turn out to be unsafe, and in part because the fact that it could turn out being unsafe itself is likely to inhibit large fund transfers into the system. It’s a bit chicken and egg, and to an extent no (solid theoretical) guarantees > no funds. And this itself (doubt can be enough) could inhibit “serious” devs from embarking on building such protocols in the first place.

If the root is transparently healthy, good things may come.

free · February 2, 2023, 8:33am

It is definitely possible for a malicious third party to reserve the maximum possible amount of compute (2 out of 4 cores; or 5 out of 10); and flood the rest of the subnet with high load in order to make it as unlikely as possible for your canister to get scheduled; and flood your canister with noise, so even when it gets scheduled, it is likely to do useless work.

I guess a defense in depth approach is needed here. The first step is ensuring a small (but large enough) compute allocation ahead of time. You don’t need to grab 100% compute allocation just in case. If you want your heartbeat to run at least once every 30 seconds, 3% should be sufficient (make that 5%, to account for any drops in block rate under heavy load).

Second, you’d need a way to prioritize useful work getting done within your canister. You could take the somewhat unwieldy approach you described (add a check at the top of every update call and after every await and bail out if the heartbeat has not executed within X seconds). Or the system could provide an equivalent way of either prioritizing some calls; or calls from the canister to itself; or based on some other criterion.

Third (and this is, I believe, the hardest part) you’d want to be able to guarantee some amount of message throughput. Unless everything happens within your canister (which I doubt is the case with a DeFi app) It’s no use for your canister to be able to process the right thing at the right time if whatever action it is trying to take as a result only happens arbitrarily late because the request was not delivered in time. I don’t think that you as a canister developer/controller can do much about the latter, except maybe rent a subnet. Which may make sense if you’re managing $1B in assets, but not while you are ramping up. One could again imagine system-provided solutions, such as multiple streams between a pair of subnets in order to provide quality-of-service guarantees for some of them, but this would definitely require serious effort to pull off. (And FWIW, we’ve seen virtually zero use of compute allocation thus far, so there’s no good reason to assume significant use of messaging allocation if it was implemented. So it’s unlikely for it to become a high priority feature any time soon.)

blabagastered · February 2, 2023, 9:33am

Actually not terrible: 5% on a 34-node subnet would only cost 26 x 1000000 x 5 x 60 x 60 x 24 x 365 / (10^12) ~4099,68T cycles per year. Very doable.

Something along these lines would help but as you sort of implicitly point out it doesn’t exist (today).

I agree this is the big problem,

In general, unless I’m being unduly stringent (please say so if it’s not as threatening as it may seem), we probably need to find a way to produce such guarantees at least to a probabilistically satisfying standard that corresponds to the TVL we aspire to (as individual projects and as IC as a whole) before responsible and successful launches can happen. In a sense responsible meaning us (intra-IC) having conviction in its security, and successful meaning others having that conviction too.

If that is so, and we want such launches within call it a year, are we on track? Or better said, if we aren’t, what do we need to do to be on track?

free · February 2, 2023, 9:44am

Thing is, it’s really hard to provide a guarantee of that kind. For one, the calls that the heartbeat handler makes, may take arbitrarily long to complete. There’s no way the system can guarantee that the heartbeat handler will complete within N rounds if some downstream call it makes takes N+1 rounds (apart from timing out downstream calls, something we’re looking into, but I’m not sure that’s what you want here).

Furthermore, the very guarantees that the protocol provide get in the way of a strong guarantee regarding heartbeat completion time. Even if the heartbeat handler only sends requests to the canister itself, those requests will get enqueued behind any other messages that the canister sent itself. And the protocol guarantees in-order delivery of requests. I imagine that one could shortcut said requests by bailing out early on; but there’s no way for the system to guarantee that, the canister would have to do it. And this doesn’t even consider requests sent to other canisters.

I’m not saying what you want is impossible. But it would require fundamental changes to the protocol itself. Something that we’re definitely exploring (e.g. by adding support for messages that trade off response delivery guarantees for timing out said messages/call contexts), but is unlikely to be a quick fix.

blabagastered · February 2, 2023, 9:51am

I don’t claim it’s possible, indeed I ask you these questions precisely because I don’t know the answers, but if it’s not possible it may be that (serious) defi is not possible on the IC. That would possibly be ok and not necessarily threaten the project as a whole, but it would be good to spell it out and decide we’re going to do A and not B.

Nothing is nor can be perfect, naturally, and the IC is a spectacular, ground-breaking machine, but for defi to exist on it we need to provide guarantees that are in effect, for all practical purposes and whether we like it or not no shortcuts allowed (however smart), comparable to those of other systems that host defi today.

free · February 2, 2023, 9:58am

Again, not saying it is impossible. And the protocol is evolving all the time, so as soon as DeFi dapps become a significant part of the IC, these changes are going to get made. Just (very likely) not in the next few months.

In the meantime, the combination of private subnet (something that would require some, but not incredibly much work; and is definitely on the radar); and subnet splitting (something we are actively working on); would give a growing DeFi dapp the option of grabbing a subnet all to itself, after the fact. At which point, it would be pretty much up to the dapp itself to ensure it gets the throughput and latency it needs. Anything more elaborate than that will take time to build.

blabagastered · February 2, 2023, 10:13am

Would this (or other planned improvements for the next 6-12 months) provide / enable guarantees comparable to existing systems, or only increase the throughput requirement of the attacker?

Isn’t security likely a precondition of them becoming significant? The situation described is something like “you’re welcome to use this defi protocol we built over the last several months. It’s not secure today but it probably will be later if enough people use it and others like it”.

I have no view on the matter. I sincerely want it to work (I’m building on it). The questions are not of my choosing. They are there.

free · February 2, 2023, 10:23am

AFAICT the only thing an attacker could do at that point would be to flood your subnet/canister with ingress messages. As long as that doesn’t cause your canister to do stupid things such as sending out tons of requests to itself and other canisters (and thus DoS itself), even the existing simple input scheduler (which selects an ingress message, then a local canister message, then a remote canister message, round robin) would be sufficient to defeat a DoS attack via ingress messages. Or remote subnet canister messages, since you;d be guaranteed that at least one out of every 3 messages you’d be handling would be from your own subnet.

So there (almost, modulo the two work items mentioned above) already exists a way of getting the guarantees that you want. It just isn’t as granular as you would maybe like it to be (e.g. “give me 5% compute allocation and 5% messaging allocation”), so it’s more expensive than it would be under ideal conditions.

blabagastered · February 2, 2023, 10:57am

Alright, I feel like there are a lot of subdependencies for everything to fall into place, and I just hope this high precision rocketship is indeed millimetrically calibrated for a safe and faraway landing.

My unsolicited advice would be that if and when it can be shown with transparent clarity that the IC can and under which conditions (eg using your own subnet) host defi to guarantees comparable to those of existing systems, make it visible and encourage strong adversarial critiques from the (technically competent parts of the) broader crypto/defi ecosystem.

This probably does mean at least that “blackholed” defi protocols are not possible today, unless they start with their own subnet, and therefore it might be good to tell people that upgradability is for now a requirement, so that it can be taken into account when designing individual systems.

Topic		Replies	Views
Question regarding timing of intercanister update calls during heartbeat Language Support Motoko	4	526	May 11, 2022
Hearbeat that has run low on cycles Developers	2	436	August 9, 2022
How does ICP canister handle asynchronous calls and event queue? Programs & Applications	4	917	July 22, 2022
Heartbeat improvements / Timers [Community Consideration] Developers community-consideration	142	9070	October 17, 2024
Queue + failing heartbeat + stopping canister = death spiral Developers	34	2164	September 14, 2022

Is there a way to give precedence to certain function executions if there is an active queue in the canister's / subnet's execution?

Related topics