I saw a tweet from Dom that subnets can process up to 750 update calls per second. I assume he excluded the numbers from the NNS subnet, which does have its own privileges for being the most important Subnet of the protocol.
Hypothetical calculations
So assuming there are 20 Subnets that process update calls @ 750 tps/sec - we get 15,000 TPS, which is a 30% increase from Dec 2021 Stress test.
Can someone please help me with the numbers? I do realize that there are different types of Subnet, but it’s not clear to a few users such as me, and the documentation is still being developed.
A brief overview would be very helpful. Thanks in advance.
The numbers I have seen in this thread are indeed close to what we have measured, either in mainnet itself or in equivalent testnets that we are running internally.
In our latest weekly performance runs we are up to about 900 updates/s (with ~6s p90 latency for update perceived by the user in between submitting it until executed). If we stress beyond that, we see failures appearing (e.g 26% failure rate for 1200 updates/s) and higher latency (around 10s)
I wouldn’t say there is one single big feature that by itself explains the improvement. I rather think that this is a culmination of many smaller improvement that we have done over time.
I wouldn’t call those theoretical numbers, as we have actually measured them with real workloads. Theoretically, the number of txns per second is probably much higher and will depend a lot on the configuration, e.g. the finalization rate and the number of messages we allow in a block. Likely, for this benchmark we would ultimately be network bound, so until we saturate the network, we should be able to increase the number of updates/s. We have not measured it with different network configurations yet, though.
Note that the actual number depends a lot on things like:
Geo replication: The more spread out the nodes are across the globe, the more conservative we have to be in terms of block rate and message count per block etc.
Application: If it’s a compute intensive application that we are benchmarking, the expected bottleneck would move from the network to the CPUs and the expected update rate would be lower.
The number of nodes in the subnetwork. The more nodes, the longer it takes to reach agreement and therefore the expected update rate would be lower with increasing subnet size.
Subnet type and configuration
There is likely more
Finally, the replica software is still mostly developed for correctness, and not for performance. With time, we will hopefully be able to push the performance further.
EDIT: Just to emphasize. I’m referring here to the performance of a single subnetwork. For the absolute numbers, those have to be multiplied by the number of subnetworks.
That’s great news! Didn’t expect almost 2x improvements in such a short amount of time.
One thing I don’t understand though is how come other chains can achieve more than 1k tps despite having higher replication factor and non permissioned nodes? The average ICP subnet has 13 nodes and their specs are very high, so I assumed the TPS to be in the thousands already.
One thing I don’t understand though is how come other chains can achieve more than 1k tps despite having higher replication factor and non permissioned nodes?
Which blockchain are you referring to and do they count read-requests as transactions?
I’m specifically trying to avoid this word, since it’s a bit contentious and different people understand different things when they talk about transactions.
The IC can currently (and again, this will improve over time) support around 900 updates/s and more than 36k queries/s per subnet.
So with the current 32 application subnetworks that’s >28k updates/s + 1.2M queries/s.
AVAX claims a 4.5k theoretical tps and registered 870tps 8 months ago, while the number is similar to the IC consider that was achieved with hundreds of nodes running on cheaper hardware, not sure how they measure it though. Their consensus model is quite interesting and seems to scale better than IC’s as node count grows, that’s cause nodes don’t have to query all individual nodes in a subnet to reach consensus, each nodes queries a randomized subset for multiple rounds, this apparently allows very quick finality and higher replication factor.
Afaik the highest registered was around 1k, which is still a feat considering Avax has hundreds of nodes vs IC’s average of 13 and nodes can be run on very cheap hardware.
If you don’t trust Dashboards, you can always run your own benchmark. If you run against mainnet, you will eventually be rate-limited by the boundary nodes.
You can also benchmark against the replica software running on your local machine (via dfx). You can check out Errors when stress testing locally for example.
Hi @Sormarler, we keep some internal records of those tests, but are not planning to release them externally, as this would mean quite significant extra overhead and we don’t currently have the engineering capacity for that.
Is this number capped by how fast a canister can process each message or amount of updates that can happen in a single round? E.g if a message executes a function which sends 1 ICP to n canisters would that result in higher tps than n messages to send 1 ICP to one canister?
It is my understanding execution is usually the fastest step in the IC stack, message ordering and achieving finality is what takes time due to network latency.
Also is time to finality influenced by how much of the state has been changed since previous round? E.g Does it take longer to reach finality on 900 updates vs 0?
In the case of this benchmark, the bottleneck is most likely Consensus and we are reaching limits on the maximum number of ingress messages that can be stored in a single block.
There are also limits in the execution environment (and elsewhere). For very compute heavy canister, you can easily move the bottleneck to the execution environment, which can (currently) only execute one concurrent update call per canister.
If you do messaging to other canisters, you could of course also hit bottlenecks there.
It entirely depends on the messages you send and the complexity of the canister code that you want to run.
Sorry if this isn’t very concrete, but it really depends
re finality:
I don’t think the amount of state should have a large impact on finality, since more changes just make calculating checksum a bit more expensive, but that shouldn’t not dominate the cost of reaching finality. I’m not an expert here, though.