This post is about a plan that will let the internet-computer-protocol auto-scale the subnets when there is too much load.
A subnet can get full at any time. New canisters can be created on the subnet and/or new traffic can be driven to the subnet, where there is more work being requested of the subnet than what the subnet can handle. This as of now can cause canisters to become non-responsive and basically unusable as a result of something outside the canister-author’s control. When that happens, there must be a plan for what to do. Manually migrating data to a new canister on a new subnet is not a good plan, if the new subnet fills up shortly after it would just be playing cat and mouse, or if the canister is controlling threshold keys or holds tokens in many accounts on many ledgers, it’s not even possible to try.
Subnet splitting has been hindered by the assumption that some canisters need to be able to be on the same subnet with each other. This post is about clarifying and calibrating that assumption.
This post clarifies that canisters need to be able to be on the same subnet-type with each other, but do not need to be on the same subnet.
A single subnet does not scale, but a single subnet-type can scale (across an infinite number of subnets).
The main idea is to get rid of any functionality between two canisters that relies on both canisters being on the same subnet. Specific subnet-ids will not be visible to canisters, only specific subnet-types will be visible to canisters. Any functionality between two canisters that currently relies on both canisters being on the same subnet will be either deprecated or made to work cross-subnet for subnets of the same subnet-type (does not need to be made to work for different subnet types).
- Composite queries → deprecate or be made to work cross-subnet for subnets of the same subnet-type.
- Management canister
create_canister
method → deprecate in favor of the CMC’s create_canister method or just route the management-canister’s-create_canister-method to the CMC’s create_canister method and pass the same subnet-type for the subnet-type parameter. install_chunked_code
→ be made to work cross-subnet for subnets of the same subnet-type.- 10MiB message arg size limit on same-subnet → deprecate, the main use case for this was for installing wasms bigger than 2MiB but now we have chunked wasms so no need for this.
Once that is done, now the path is clear for how to auto scale the subnets. Let’s say we start with subnet-A that has a single canister-range [0…10]. When subnet-A fills up and has too much load, the protocol can split subnet-A straight down the middle of it’s load, create a new subnet: subnet-B with the same subnet-type, give subnet-B half of subnet-A’s load, and possible assign additional ranges to both subnets if the leftover ranges are too small to leave room for future canisters. So after the split we have the ranges: subnet-A: [0…5, 10…15], and subnet-B: [5…10, 15…20]. Now let’s say after some time subnet-A gets full again, the protocol can do the same thing again however this time it can either give one of subnet-A’s two existing ranges to a new subnet-C, or if >75% of the load is localized in one of the two ranges lets say the load is dense within the 2…4 range then subnet-A can split that range down the middle. So then we would have subnet-A: [0…3, 10…15], subnet-C: [3…5, 20…25], subnet-B stays the same: [5…10, 15…20], and of course like last time additional ranges can be assigned to subnet-A and C if the leftover ranges are too small to leave room for future canisters. The protocol can keep doing this when subnets get full as long as there is enough node-machines available, The protocol can auto-scale forever.
A cool part of these dynamics is that the greater the number of ranges that a subnet has, the greater the probability that a load split can take place with giving one or more of it’s existing ranges to a new subnet and getting the same number (or less) of new fresh replacement ranges, without having to increase the total number of ranges for that subnet. This happens since the greater the number of ranges on a subnet, the smaller each range is, and the smaller the probability that >75% of the load will be localized in one range. This means that the number of ranges per subnet will stabilize on average.
Size of the routing table: for 1,000,000 subnets (one-million) with an average of 10 ranges per subnet, the total size of the data in the routing table is 29 * 1,000,000 + 29 * 2 * 10 * 1,000,000 = 609000000 = 609 MB.
:Levi.