Great questions @FranHefner and @gatsby_esp!
I discussed them with @dsarlis and @stefan-kaestle. Here is our combined response.
We will focus on query performance and throughput here because that was the main reason for the degraded user experience during the airdrop.
Q1: What is the performance limit in cases of high demand?
There are several factors that affect the query performance/throughput:
- Is the query cacheable or not cacheable? A query is cacheable if its request headers and input parameters do not change frequently.
- How many instructions does the query execute?
- How much memory does the query read and write?
In the best case, the query is cacheable. Then it is handled by the boundary nodes and the expected throughput is about 100K queries per second per boundary node. The total throughput scales linearly with the number of boundary nodes.
If the query is not cacheable, then the query performance depends on how much work in terms of executed instructions and accessed memory the query does when running. If the query is optimized and does very little work (e.g. executes a few thousand instructions and touches a few pages), then we can expect the query throughput of ~3000 queries per second per node. On a subnet with 13 nodes, that would translate to about 40K queries per second. This number can increase by 10x with more investment in the core IC optimizations.
If the query is not well optimized and executes millions of instructions, then the query throughput decreases proportionally to the performed work. This is similar to query performance in traditional Web2. That’s why it is important that developers stress test and optimize bottlenecks in the query code before a big launch and expected workload peaks.
One area for improvement here are tools for monitoring and debugging performance of canisters that can be developed both by the community and DFINITY.
Q2: Is it something that developers can do or is it exclusively in dfinity?
As explained above, both the query code of canisters and the IC code are critical for performance.
If the query does a lot of work (millions of instructions), then the highest impact is on the developer side. By optimizing the query to execute thousands instead of millions instructions, the developer can increase the throughput by several orders of magnitude. Typical performance optimizations known for other systems also apply to the IC. Examples are 1) to avoid random memory accesses in favor of linear access and leverage locality, 2) cache and index data that is read frequently.
On the IC side, we see a potential for at least 10x improvement in query performance with more investment in optimizations.
Q3: Is it only related to the nodes that process and is it solved with more nodes?
Queries are handled by individual nodes locally and do not require communication with other nodes of the subnet. Therefore, adding more nodes will improve the query throughput proportionally. However, as mentioned in the previous answer, adding more resources might not be necessary if software in both canister code and IC code are continuing to be improved.
Q4: How is the growth scheme to avoid having this type of performance degradation?
The best way to avoid performance degradation is to stress test and profile query performance to find and remove the bottlenecks:
- Make sure that most queries are cacheable.
- Measure the number of instructions executed by a query using
ic0.performance_counter()
.
- Estimate the number of memory pages accessed by the query and leverage locality and linear access whenever possible.
- Consider more optimal data-structures and other well known optimization strategies such as indexing.
- If you want to grow beyond O(100K) queries per second, then consider scaling out to multiple subnets.
Q5: Is ICP ready for commercial application use?
We are not aware of any fundamental limitation that would prevent commercial applications to use the IC effectively and scale to a large number of queries per second. As outlined from the above answers and as with other systems, there is a combination of factors that might contribute to query throughput. An application that is designed to tackle efficiently the expected load by taking advantage of caching on the boundary nodes, having optimized queries that do as little work as possible and potentially even use multiple canisters to serve queries and avoid memory bottlenecks when hitting a single canister should be able to scale to hundreds of thousands of queries per second, especially when also scaling out across subnets.