High User Traffic Incident Retrospective - Thursday September 2, 2021

Ignoring update calls for now, I would love to see the IC itself provide infinite scaling of query calls for canisters. Maxing out the cores on individual replicas will help, but could the IC manage backup read-only replicas that are ready to be deployed ad hoc into subnets experiencing spikes in query traffic? I imagine this would be relatively simple to do, at least theoretically.

As an example, imagine the ICPunks drop. Assuming there were 7 full replicas in the subnet, as query traffic began to approach certain limits, the subnet would request extra read replicas. These could be relatively quickly added into the subnet, using the catch-up package functionality to quickly get the replica onboard. It wouldn’t particpate in consensus, it would be a read-only replica.

As traffic continued to increase, read-only replicas could continue to be added. The IC would have to maintain a pool of these replicas, always ready to be deployed where needed. Once traffic died down, the replicas would be returned to the pool. If the traffic never died down, perhaps the replica would become a permanent part of the subnet.

So subnets might have a fixed number of full consensus replicas, and some number of read-only replicas. This would not slow down consensus, but would scale out query calls infinitely without the developer needing to do anything fancy (even a single canister would automatically scale out queries).

Please consider this, I think it would be a powerful capability.

7 Likes