The alternative would be to reduce the number of canisters per subnet to a ridiculously low number (say, dozens), just in case all of them wanted to go full tilt all at once. That would mean orders of magnitude higher fees, to cover the cost of running an almost always idle subnet.
Actually, you can already have that: you can reserve 100 compute allocation (and, for good measure, a couple hundred GB of storage) for your canister. That will actually give you guaranteed sub-2 second latency (modulo drops in block rate, but we are working on addressing those in the near future).
But whether you look at this from a technical perspective (see my post above) or from the economics point of view (each of 100k canisters all expecting reasonable throughput and 2 second latency for cents a day, independent of what the other 99.9k canisters are doing) I don’t think there’s any mystery as to why things may not work as desired at all times.
The European subnet has fully 20k canisters (out of a total of 21.5k) that all have something to execute at all times. Luckily AFAICT the vast majority of said canisters have run out of cycles and are frozen. So we “only” have some 700 canisters that have something to execute (mostly heartbeats) and have the cycles to do it. And because of whatever it is they are doing, at full load the subnet can only execute about 60 of them per round (or about 120 canisters per second).
In other words, the subnet is about 12x oversubscribed. I.e. even given instant canister migration and automatic load balancing, we would need 6-12 subnets to handle that load while still providing 2 second latency for everyone. For something someone did on a whim (or, more likely was unaware of) and now costs them a few hundred dollars a day to run. (Still, I doubt they’re paying it out of pocket, it’s more likely some friendly competition a la Bob, with participants each putting up something like 10 ICP or whatnot.)
Edit: This is actually a perfect illustration of my point above: with only 700 active canisters, each of which only requires a modest 1/15th of a core, you get latencies of 20 seconds.
Thanks again for that inside’s and please don’t take this the wrong way.
I appreciate your work and that of Dfinity; I’ve been a big fan since the Genesis event. I’m still eager to learn more, but I’m looking for practical applications to transition Web2 IT infrastructure to the Internet Computer, with real projects and real customers located in the European Union.
But what does all this mean for developers? The problem is now likely more understandable for all of us, but the question remains: what does this mean in reality for developers which try’s replace tradition IT infrastructure?
In my experience, with a 20-second update time, you can’t replace traditional IT infrastructure.
What works very well, however, is hosting static websites. We have four static websites running on the European Subnet and we can say they meet our expectations, the time out problem for an re-deployment is solved I think with the last improvements.
No worries, I just get intense when I talk about technical stuff that I’m involved with. (o:
Apologies on my end too.
I agree that given the current limitations, this is a difficult proposition.
I’ve heard ideas being discussed to allow e.g. (manual) migration away from busy subnets for canister controllers, by simply remapping the canister ID to a different subnet and then allowing the controller to use some variation of canister snapshots to restore their canister on the new subnet. This is less than ideal (load can follow you anywhere; and this would likely be a manually intensive process, especially if you’re dealing with more than a couple of canisters). But it may well be useful as a short-term mitigation
There are also discussions regarding deprioritizing batch workloads in favor of interactive ones (so an ingress message would take priority over a heartbeat).
But most importantly in my opinion, as someone who has been pushing for scalability (best-effort messages, ingress bandwidth, immutable storage, fair scheduling) since I’ve been here, upper management is now 100% behind addressing these issues. For a long time we were mostly adding features and smoothing rough edges, trying to make the platform attractive to developers like you (and DeFi, and DAOs, to be fair). But now that we have your interest, we really need to deliver the goods. It’s going to take a lot of time; and in the meantime the tools at your disposal will be rather crude (to the extent that they will be there at all). But we are intentionally moving in that direction.
It’s a serious concern if an attacker can render a subnet unusable for such a small cost. All the talk about being tamper-proof and unstoppable goes out the window. (For user-facing dapps, updates taking more than two seconds feel like an eternity.)
This also makes sort of sense considering that any web2 server also has some form of ingress protection not just from their webpage but also e.g. on their public API’s. For example, spamming an Discord API from my server will also get me rate limited.
But the interesting difference here is that heartbeats and timers are messages to your own canister, intended to e.g. background data processing. So basically, this would mean, rate limiting calls to your own canister.
From a subnet load perspective, I suppose calls to a canister itself or another canister have the same impact?
I think discussing exactly how much throughput the subnet can handle is actually not all that relevant for the long term scalability story.
The cycles prices of using ICP should make sure that whenever a subnet is heavily loaded, it burns more cycles than node providers receive. Adjustments are being proposed to ensure this is always true (link). So in the long run, whenever the load on the existing subnets starts getting high, new subnets should be added, and it only adds to deflation, as every subnet burns more than in mints.
The caveat to this story now is that balancing load across subnets is difficult, so that is the main thing we want to address for long term scalability, as outlined in this post. But once that’s there:
whenever we’d see increased latency on some subnets due to high load, but other subnets have less load, then canisters would migrate to balance the load
if all subnets are highly loaded, it means a lot of cycles are being burned, so ICP holders are happy, and new subnets are added, and we go back to step 1.
So the challenges we’re seeing today are growing pains, but not fundamental issues that can’t be overcome.
For canisters with no compute allocation, yes. But simply deploying onto the same virtual machine as another 50k canisters should be a good indication that you will not be getting any strong guarantees.
That being said, I fully agree that the situation is still far from ideal. Which is why it’s being worked on.
We should define ‘highly loaded’ as any instance where latency exceeds 2 seconds; otherwise, we’re not delivering on the promised performance. If achieving this isn’t possible, Dfinity should be transparent and communicate that clearly.
Yes I agree that latency should only be high (on average over some time) on highly loaded subnets. I think setting that bar at 2 seconds is a bit ambitious in the short term, but in my view it should definitely be in the order of seconds, and I think that’s achievable. So phrased differently, in my view there should always be a subnet where you can expect latency in seconds, and otherwise new subnets should be added.
It is physically impossible to guarantee 2 second (i.e. 1-2 rounds) latency to every single zero compute allocation canister at once. Now or in the far future.
Unless you have a compute allocation of 100, 2 seconds is the good-weather latency, not a hard guarantee. Even with a compute allocation, you don’t get a latency guarantee for individual requests: if you have a single canister trying to handle thousands of updates per second, latency will shoot through the roof regardless. All you get is the guarantee that you will run every round.
Without that kind of guarantee, you run whenever it’s your turn to run given the load on the subnet (and later, on the IC as a whole).
What actually causes the high latency here? I mean which bottleneck are we hitting?
I suppose it is not the number of ingress messages. I suppose it is the execution rounds that are full? What kind of messages (ingress or heartbeat) are filling the execution rounds?
Maybe dynamic pricing is enough to fix it? I think it’s a better deterrent than raising it to a fixed price. Can that be implemented easily changing compute allocation? Or just allowing to say “I want the minimum compute allocation that allows me to execute every round depending on load”.
What’s the main reason why this is possible on AWS but not on ICP? I understand that decentralization comes with limitations, I’m just curious about what is the biggest blocker
Ir is indeed the full execution rounds. In some cases the instruction limit is hit (we have subnets that only execute less than 100 updates per round on average because the updates themselves are quite heavy). And in other cases it’s simply the context switching (subnet fuqsr executes 600+ canisters per round, or about 1000 per second). So it’s not always the same.
Same with the types of messages causing the load: on fuqsr some 2/3rds of messages are heartbeats; on bkfrk 3/4 are timers; on lhg73 and k44fs it’s mostly updates and reply callbacks (although they’re likely also triggered by heartbeats and timers somewhere or other, since the load is pretty flat). There are also subnets with significant ingress message backlogs, even after the scheduler improvements: until yesterday, k44fs was seeing backlogs of up to 2.5k ingress messages, even though the scheduler managed to execute pretty much all canisters with messages every round (so it was likely one or a few backlogged canisters).
We’ve avoided dynamic pricing because we want canister developers to be able to predict their costs. You don’t want your canister running out of cycles overnight because the subnet it’s on is suddenly experiencing a lot of load. While a similar canister with similar load on the next subnet over pays next to nothing.
That doesn’t work. Either you actually reserve 100 compute allocation and then you always get to run every round. Or 50k canisters all pay for “execute every round priority” and none of them gets it once all of them are busy. I.e. AFAICT it’s either a guarantee or it’s best-effort; there’s no in-between.
In traditional AWS you get your own VM. So you never have to compete with other applications (except maybe for bandwidth).
In AWS Lambda (serverless computing, more similar in some ways to what we have with canisters) they can easily shift load around across servers. Partly because that’s all AWS Lambda was built to do (we spent a lot of time building up other parts of the stack) and partly because they have no data attached to their code. Canisters are code + data, AWS Lambda is just code (with the data all held in one central DB); it is a lot easier to replicate and “deploy” snippets of code across physical servers than it is to atomically move data+code across replicated virtual machines running on untrusted hardware.
Given a magical data store with arbitrarily high capacity both in terms of storage and throughput (e.g. your own subnet(s) to just host data) it is trivial to load balance frontends across any number of sometimes busy subnets.
So on the one hand it’s the fact that they can actually shift load “instantly” (partly because of their architecture). And on the other hand it’s just scale: they probably have hundreds of thousands of servers to balance load across; we have less than 50 subnets.