LAMENT: A tale of constant struggle of what it's like trying to scale on ICP

icme · October 8, 2024, 4:46pm

I’m pretty convinced that at this point, fewer, beefier canisters is the most efficient way to scale within a single subnet, and on the Internet Computer in general.

First off, I have a huge amount of respect for the engineering work that’s gone into building dynamic massively multi-canister (MMC) architectures on ICP. A lot of engineering blood, sweat, and sleepless nights has gone into scaling apps across multiple subnets . However, as shown over the past few weeks this architecture of spinning up tens of thousands of actively computing canisters on a single subnet inefficiently exhausts that subnet’s resources and does not scale well .

Spawning thousands of canisters that do roughly same thing slows down checkpointing & state sync, and slows down execution for other canisters (OpenFPL and everyone else). Adding a timer or redundant computation into all of those canisters that performs the same compute at the same time is inefficient from the protocol’s perspective, and is directly causing the slowdowns to other ICP apps these past few weeks. DFINITY engineering has been pretty clear about canisters per subnet and active canisters per subnet limitations in working groups, ICP.Labs, Global R&Ds, and communications on the forums over the past year and a half , so its not like these limits are new.

From building CanDB back in early 2022, I’m a huge fan of the actor model and the canister service architectures that dynamically spin up thousands/millions of canisters. While I understand the composable benefits of a canister per user mentioned by @saikatdas0790 above, these MMC architectures are complicated to build and maintain, are several orders of magnitude more expensive in terms of cycles costs, and don’t scale well performance-wise within an ICP subnet. In fact, now that Rust has stable structures and with Motoko’s upcoming Enhanced Orthogonal Persistence, I’d recommend that most devs start out with a single canister architecture and avoid premature splitting/optimization unless absolutely necessary.

If you’re an app that has chosen the dynamic 100k - 1million+ canisters per app path, then the only option you have moving forwards is to spill over onto other subnets. But this architecture is an incredibly inefficient use of resources, so instead of using the getting more out of the hardware and software you have in a single subnet, it’s making an inefficient use of multiple subnets .

As this “1M+ canister architecture” spills over into other subnets, since ICP subnets are shared resources the architecture’s inefficiency ends up starving canisters on those subnets of compute and slowing down the rest of the apps on the Internet Computer. In a way, regardless of the intentions of the application, rapidly spinning up canisters on ICP becomes the quickest and cheapest way to DDOS a subnet

DFINITY made a number of performance and scalability optimizations over the past two years that now allow a larger number of canisters per subnet, but these optimizations assume that most of the canisters on a subnet are idle within a single round of execution. As a messaging app where most of the daily active users are not constantly sending messages, OpenChat is a perfect example of what the subnet improvements are optimized for. This is why OpenChat can have 91k canisters on a single subnet while maintaining decent performance.

However, in the Yral/Bob case all of the canisters regularly perform repeated computations (index/ledger canisters, timer based algorithmic feed re-compute, etc.) then these performance optimizations don’t work, and ~20k regularly active and computing canisters will fully utilize a subnet. At this level of compute, it’s more efficient for subnet and the app if the app condenses into just a few canisters (1-100) per subnet. This has the added benefit of the app being able to raise each individual canister’s compute allocation as needed, which would be difficult/cost prohibitive with 1M+ canisters or an MMC architecture.

If you’re curious about canister-subnet limits and want to learn more, a few resources I recommend are:

These forums (great search resource)
Global R&D (highly recommend attending the weekly ones if you’re a developer)
Performance and Scalability working group - meets once a month to discuss a variety of new feature & scalability related topics. I highly recommend attending if you want to learn about the latest protocol features and the best way to scale your app on the Internet Computer. They also record sessions, which you can watch after the fact!

Here’s a meeting recording from the Scalability and Performance working group this past July that dives into many of the soft & hard limits of canisters per subnet, as while touching on how many of these scalability assumptions don’t hold if most of the canisters on the subnet are constantly active & executing computations.

@jamesbeadle from many of the comments this thread, it also seems that there’s a lot of confusion around how to build and scale an app on ICP. And reasonably so, there are many different approaches people have taken (single canister, few canister, MMC/canister per user, etc.) and no clear signal of which approaches scale the best (everyone is biased towards their own solution ).

Maybe something that would help at this point would be to create a space where ICP devs can receive architectural feedback on the design and scalability of their apps from DFINITY engineers. Ideally, DFINITY engineers can provide supporting charts/metrics/data and other developers can learn from these communications.

Topic		Replies	Views
Proposal idea : short term fix to the scaling issue General	2	137	October 8, 2024
Thoughts on Subnet Optimization and Scalability for Future Growth General	1	53	May 15, 2025
Path forward for subnet splitting and protocol scaling Developers	19	362	October 17, 2024
Voting is now live for a new proposal for the scaling of the world-computer NNS Governance	24	681	April 21, 2025
Canister Load Balancing (Community Consideration) Roadmap	5	1358	September 16, 2021

LAMENT: A tale of constant struggle of what it's like trying to scale on ICP

Related topics