Internet Computer Performance - Dec 1, 2021 Load testing

Summary

This post describes the DFINITY Foundation’s performance evaluation of the Internet Computer as of December 1, 2021,. We will periodically update the numbers to reflect performance improvements realized over time.

Scalability of the Internet Computer comes from partitioning the IC into subnet blockchains. Every subnet blockchain can process update calls from ingress messages independently from other subnets. The IC can scale up by adding more subnets at the cost of having more network traffic (as applications then need to potentially communicate across subnets). In its current form, the IC should be able to scale out to hundreds of subnets.

Query calls are read-only calls that are processed locally on each node. Scalability comes from adding more nodes, either to an existing subnet (at the cost of making consensus i.e. update calls more expensive) or as a new subnet.

The performance and load testing have been done primarily by @stefan-kaestle.

Test Setup

We are running all of our experiments concurrently against all subnets other than the NNS and some of the most utilized application subnets to avoid the disturbance of active IC users. We send load against those subnets directly and are not using boundary nodes for those experiments. Boundary nodes have additional rate-limiting which is currently set slightly more conservative compared to what the IC can handle and running against them therefore is unsuitable for performance evaluation. We are targeting all nodes in every subnet concurrently, much the same as what boundary nodes would be doing if we would use them.

We have installed one counter canister in every subnet. This counter canister is essentially a no-op canister. It only maintains a counter, which can be queried via a query call and incremented via update call. The counter value is not using orthogonal persistence, so the overhead for the execution layer of the IC is minimal. Stressing the counter canister can be seen as a way to determine the system overhead or baseline performance.

Since nodes and subnets are being constantly added to the IC, it is worth describing the topology during these tests:

  • Nodes - 375
  • Subnets - 27
  • Largest subnet - 37 nodes
  • Smallest subnet - 13 nodes

Source: IC Dashboard

Measurements

Update calls

The Internet Computer’s (currently at 26 subnets + the NNS subnet) can currently sustain more than 11’000 updates/second for a period of four minutes, with peaks over 11’500 updates/second.

The update calls we have been measuring here are triggered from Ingress messages sent from outside the IC.

Query calls

Arguably more important are query calls, since they contribute to more than 90% of the traffic on the IC.

The Internet Computer can currently process up to 250’000 queries per second. During our experiments, we increment the load incrementally and run each load for a period of 5 minutes.

Conclusion and next steps

The Internet Computer today already shows impressive performance. On top of that, it should be possible to further scale out the IC by:

  • More subnets: This will immediately increase the query and update throughput. While adding subnets might eventually lead to other scalability problems, the IC in its current shape should be able to support hundreds of subnets.
  • Performance improvements: Performance can also be improved by a better single machine, network, and consensus performance tuning. Increasing the performance by at least an order of magnitude should be possible.
15 Likes

Update calls Chart

The Internet Computer can currently sustain more than 11’000 updates/second for a period of four minutes, with peaks over 11’500 updates/second.

The update calls we have been measuring here are triggered from Ingress messages sent from outside the IC.

6 Likes

Query calls

Arguably more important are query calls, since they contribute to more than 90% of the traffic on the IC.

The Internet Computer can currently process up to 250’000 queries per second. During our experiments, we increment the load incrementally and run each load for a period of 5 minutes.

5 Likes

Got questions? Go ahead and ask. Performance is something that touches many layers of the IC, so i may have to ping different folks in (not just @stefan-kaestle who did the load testing) depending on questions.

3 Likes

Adding more nodes increases the scalability, is not it? The number of nodes and usage are related as far as performances? If the number of nodes increases 10x while usage increase 5x, is that mean speed will increase further? If the number of subnets increases substantially, adding nodes does not have any impact on the subnet performance unless some load is transferred to added nodes. Is that correct, or am I missing something?

Hi there,

yes, your observations are correct. Adding more nodes to the system helps to scale. It’s important to distinguish:

  • If nodes are added to existing subnetworks, the query performance of that subnetwork increases at the cost of higher network traffic due to more nodes having to reach consensus.
  • If nodes are added as a new subnetwork, update performance improves because there are more concurrent instances of consensus running.

As you can see, performance tuning can be quite tricky and is to some extend workload specific. So far, the IC is very much dominated by query calls, so adding more nodes to existing subnetworks would help if application demand would be growing (and in this case, it’s true that if you double the number of nodes in subnetworks you could handle twice the amount of query calls).

Likewise, if you wanted to double the capacity of update calls, you could double the number of subnetworks. This is assuming that workloads are independent of each other. Otherwise, scaling out by adding more subnetworks would lead to a larger number of cross-net traffic and might therefore introduce other bottlenecks.

Automatic load-balancing is something that the IC currently cannot do. As you observed, for the future growth of the IC this is a crucial feature that we have to add to leverage added resources.

On top of that, we also hope we can greatly (perhaps 10x) improve the performance on each individual node. So that should allows us to significantly increase query + update requests rate without having to add extra nodes.

12 Likes

Wow, how would this be achieved?

1 Like

Wow, how would this be achieved?

Mostly just from doing a couple of obvious optimizations, that we are aware we can do but haven’t gotten around actually applying.

For example, we can much more aggressively execute queries concurrently. At the moment, a lot of the CPUs are idle because we are too conservative in concurrently executing requests.

4 Likes

A reader may wonder… “why not just knock them out then?”

@stefan-kaestle would it be safe to say that most known optimizations(but incomplete optimizations) have not been done because any of the following reasons:

  • the people who could implement optimization X may be working on feature Y (e.g. ICP in canisters) which may have greater impact

  • the people who could implement optimization X may be working on debugging or fixing stability issues that came up as IC scales

  • there are optimizations X where there are very conservative design decisions (like the idle CPU thread you mentioned), and loosening the restrictions is something that may get done over time, as system is observed, with enough time and scenarios passed to make sure the there are no stability or security issues opened up.

Would that be a fair way to characterize what has mostly jumped the priority Queue of yet-to-be-implemented-but known optimizations? Am I thinking about it in the right way? Or did I miss the mark?

2 Likes

I’d say it’s a combination of all the three points that you have mentioned @diegop.

I think it’s also that scalability is currently not really an issue. We are perfectly capable to handle the loads we currently see.
Because of that there hasn’t really been a need for such optimizations yet. So it’s probably safer to ramp up slowly, but safely.

4 Likes

Makes sense. Thank you Stefan!

2 Likes
  • If nodes are added as a new subnetwork, update performance improves because there are more concurrent instances of consensus running.

Can you clarify this point? Not sure I understand.

If nodes are added as a new subnet, wouldn’t that not affect update calls to existing canisters, since those canisters live on existing subnets and not the new subnet?

If nodes are added as a new subnet, wouldn’t that not affect update calls to existing canisters, since those canisters live on existing subnets and not the new subnet?

It’s correct that this would not affect update calls in existing canister. What we mean with that is that the total number of update calls per second we can handle across the IC would increase.

As long as we don’t have any applications that span multiple canister on multiple subnetworks, this will already allow the IC to scale for quite a while.

Does this help to clarify?

1 Like

Err, if you mean that new subnets help the IC handle more updates in aggregate, then that totally makes sense.

Hello everyone,

We are planning to do a community talk on this subject in January.
Is there anything that interests you in particular that we should be preparing for you?

We would probably be showing you a little bit about the methodology of how we are measuring the performance, show you some of the reports we have recently been generating and some performance trends.
Finally, we can also discuss some opportunities for performance improvements that we are aware of at this point, but haven’t been getting around actually doing.

Cheers,
Stefan

5 Likes

I’m wondering if any of those performance improvements have made it onto mainnet yet.

Also, have there been any studies about the update/query load that a single canister (or subnet) can sustain? Thanks.

1 Like

Hi @jzxchiang,

We do frequent roll-outs, so all the performance improvements we have made already should have found their way to mainnet by now.

However, we did not achieve a 10x improvement yet. I guess the main reason for that is that we haven’t really had a need to focus on performance since current mainnet utilization is far from the maximum capacity in most cases and our engineers are occupied with other features.

All the experiments we have been doing are against a single canister in each of the subnetworks already, so those numbers should be what you are looking for.

If you have ideas for further experiments and workloads, please let us know. We are always eager to learn about them.

Cheers,
Stefan

2 Likes