TL;DR
DFINITY’s suggested next steps to improve the handling of high load on certain subnets on mainnet:
- Continue focusing on replica improvements
- Remove heartbeats from SNS canisters
- Propose adjustments to cycles pricing following motion proposal 133388
- Propose changes to the target topology to add new subnets
In the mid-term, DFINITY will concentrate on enabling canister migration to facilitate scaling the ICP by adding new subnets and more effectively balancing the load across them.
Background: High mainnet load
The load on mainnet significantly increased since mid September, and continued to grow rapidly in the following month. It is great to see more adoption, but this rapid growth also led to increased latency: on many subnets, the replicas could not process all messages, leading to high latency and even ingress messaging timing out before they could be processed. This was discussed in a separate forum topic.
There were two main causes that led to the subnet not handling the load well.
In ICP, reaching agreement on messages and the actual processing of those messages are separated. The blockchain decides which messages are accepted, and every time a new block is finalized, an “execution round” is triggered. Every execution round is limited in how much work (measured in “instructions”) it performs, such that it completes in roughly a second. That means that if a block contains a lot of messages that take significant processing, it may be the case that not all messages from that block can be processed immediately in the next execution round, and some messages may wait in canister queues to be processed later. A scheduler component is used to decide which messages out of the available messages in canister queues to process on the different processing threads every round.
The scheduler was ensuring fairness only in selecting which canister has the first chance to run in a certain round on a certain thread. Being chosen as the first canister in a round happens infrequently if there are a lot of canisters that have messages to execute in a certain round. For example, with 20k active canisters on a single subnet, 4 compute threads, and 1.5 blocks & execution rounds per second, every canister would be scheduled first only once every 20000/4/1.5 = 3333 seconds or 55 minutes. Luckily there is often time left in an execution round for other canisters to do work, but the scheduler did not factor that into future scheduling decisions, so this extra time in the round was not fairly distributed. This led to many canisters only being scheduled very infrequently with such a high amount of active canisters.
This scheduler issue was addressed by changing the logic of the scheduler to also factor in which canisters could use “leftover computation” in execution rounds even though they were not scheduled first. However, this change alone performed worse on the workload we are seeing on mainnet today, which brings us to the second problem.
Every canister is executed in a sandbox for security reasons, making sure canisters are isolated and cannot maliciously read data from other canisters, or bring down the replica if they find and exploit a weakness in the WebAssembly runtime. However, starting and stopping all these sandboxes is additional work. For this reason the replica maintains a cache of sandbox processes to help speed this up. While executing canisters, the replica might need to evict some of the older sandbox processes if there’s no room in the cache and bring up new ones. If the replica tries to do this too quickly, the system slows down due to thrashing. This limits how many distinct canisters can process messages in every execution round.
The load on mainnet consisted of a huge number of small heartbeats for many different canisters, so exactly what is difficult to process for the replica.
The situation was finally significantly improved on October 15th with a new replica version, which included the scheduler changes and an increase in the amount of sandboxes the replica keeps cached. Since this version, the situation on mainnet has drastically improved. We see very few ingress messages expire on all subnets, but still increased latencies on certain subnets.
Suggested short term measures to reduce latency
DFINITY is working hard on bringing further improvements and considers this a top priority. We propose the following immediate next steps:
-
Continue to focus on replica improvements that handle many active canisters better. Concretely, the plan is to further increase the amount of canister sandboxes that can remain cached, such that the replica can better handle a large set of active canisters. Additional improvements in the scheduler will also be considered as a second step.
-
A big part of the load originated from many instances of the SNS, each of which contains multiple heartbeat canisters. Note that there are way more instances of the SNS canisters than just the ones that went through a decentralization sale. New versions of the SNS canisters have been created and are in the process of being adopted by the NNS, and the new versions no longer use heartbeats but timers, which significantly reduces the cycles consumption and load on the system incurred by these canisters. DFINITY will try to encourage and support upgrading all SNS instances to these new versions.
-
It was observed that this specific load pattern caused a lot of load, but did not burn a huge amount of cycles. A guiding principle should be that a subnet at capacity burns more cycles than node providers receive in rewards. This was also brought forward in adopted motion proposal 133388. DFINITY will propose a concrete change to certain cycles costs, with the aim of ensuring that all workloads have a cycles cost that is in line with the load it causes on a subnet. We’ll share this on the forum later today. link
-
DFINITY will propose updating the target topology to include more subnets, and if adopted, propose to create more subnets such that more compute capacity is added to ICP. We’ll discuss this in more detail on the forum in the coming days. link
Outlook: ICP scalability and load balancing
The above is mainly focused on ensuring that each subnet can process a lot of load, and that this load costs a proportional amount of cycles. However, this is not the core of ICP’s approach to scalability. ICP’s high level approach to scaling can be summarized as follows:
- Every subnet has finite “replicated” capacity, but capacity can grow by adding subnets
- Load can be balanced over subnets
- A subnet’s query capacity can grow by adding more nodes to the subnet
In ICP today, the weakest link in this story is load balancing: some subnets were highly loaded while others were not, but the load could not easily be balanced over the subnets. There is basic support for splitting a subnet into two via a sequence of NNS proposals, which creates a new subnet that takes over half the load of an existing subnet. There are a few challenges with subnet splitting.
- Subnet splitting is driven by the NNS. This means that individual dapp controllers cannot make their own decision on what latency is acceptable.
- One major challenge with this is that colocation of canisters matters. Two canisters on the same subnet can communicate much faster and with higher throughput than canisters on different subnets. Similarly, composite queries currently only work with canisters on the same subnet. This means that it is critical that in a subnet split, the right canisters remain together on the same subnet post-split. However, there is not complete freedom in defining how the canisters are split due to how routing canisters to subnets works, and it’s difficult for NNS participants to know what a “good” split would be.
Things would be simpler if canister controllers could individually decide to migrate to another subnet. Today, ICP does not offer built-in support for canister migration, meaning you can only manually migrate your data to a new canister id on another subnet. Changing canister id however can be impactful because, for example, it changes how others talk to your dapp and changes the threshold signing key your canister has.
Next steps: DFINITY plans to focus in the mid term on supporting canister migration natively in the protocol. That means a canister controller could choose to migrate their canister to another subnet without changing the canister id. We believe this feature would unlock the full scalability of ICP as this would provide a full solution to balance load over subnets. Every developer can make their individual subnet choice, and ensure that canisters that benefit from being colocated remain on the same subnet. Canister migration would also enable better utilization of subnets. For example, compute-heavy dapps would likely move to subnets with little compute load, and storage-heavy dapps would migrate to subnets with lots of free storage. This will likely lead to every subnet getting a nice mix of dapps that together require a mix of resources, increasing the overall work every subnet does and the cycles it burns. There are still big design challenges to overcome to enable canister migration, so we cannot give an accurate timeline, but this is what we’ll focus on, and we’ll make sure to share the progress.
Discussion
We look forward to hearing your thoughts, and we’re happy to answer any questions.