TL;DR
The Crypto Team at DFINITY recently finished an effort on improving the performance of multiple components of the IC in order to deliver a smooth user experience during events with a high user activity such as decentralization swap. In this post, I will focus on an important part of this effort – an efficiency improvement to the canister signature verification by caching those signatures, which significantly increased the message rate that a subnet can process and contributed to the very smooth OpenChat decentralization swap experience.
Motivation: Improving SNS-1 Efficiency Bottlenecks
Last November DFINITY conducted the SNS-1 swap as a test run to reveal potential efficiency bottlenecks before the first real swap. Although high participation is something we generally expect during such an event, the exact and peak numbers as well as the concrete performance implications are often not 100% clear.
During the SNS-1, we observed high peaks of user activity, especially at the beginning of the swap, with a maximum of ~300 user requests per second of which up to ~50 could be processed per second due to subnet performance limits. The main bottleneck was the BLS signature verification for the requests that came from the II-authenticated users. Overall, it took a significant amount of resources and slowed down the finalization rate because the replica could not keep up with the many canister signature verifications.
Background: User Authentication Costs for a Replica
Improvements to canister signature verification matters because they affect Internet Identity (II). II is a very common standard for user authentication on the IC. It is used in the majority of the most popular canisters such as the NNS frontend. Its main function is to improve the users’ privacy and enhance security of the users’ cryptographic keys. To understand the value of canister signature verifications, it is worth explaining the basics of how II works:
- To register with the II, a user creates an account which is identified by an identity anchor, a sequential number assigned by the II canister. Then, the user can register a device to its account by uploading a cryptographic public key generated by the device. One user can register multiple devices with the same identity anchor. For every service the user intends to log in, the II canister generates for them a different pseudonym.
- When a user wants to log in with a service offering II authentication, they are first redirected to the II canister. The user authenticates with the II canister using one of its devices, and generates a temporary session signing keypair. The user then requests the II canister to authorize the session public key to log in with the desired service. The II canister does so by issuing a so-called canister signature, which serves as a certificate delegating the temporary session key to authenticate on behalf of the user’s pseudonym for that service.
- Once the user has obtained the delegation from the II canister, they can interact with the service by signing their request or query using the temporary session key and attaching to it the delegation.
Efficiency: Whereas the validity of the delegation is rather costly to verify, the validity of the request or query can be verified very efficiently due to the more efficient signature algorithms that can be used for session keys.
Security: The service only ever sees the session key pair and the persistent pseudonym, while their long term public key and identity anchor is only presented to the II canister.
The II heavily relies on canister signatures, which are a signature algorithm developed for the IC. They are a nice and structured way how any canister can certify that a piece of data comes from it.
The actual signature part of a canister signature is computed by the subnet hosting the canister, and it consists of a single threshold BLS signature. These signatures are computed by each subnet after every execution round to certify some parts of the subnet state. Every canister can request the subnet to include a single value in its certified state, and get back the resulting signature. In this way, all requests coming from different canisters in the same round are certified with the same BLS signature.
To sign one or more messages, a canister uses a particular data structure called Merkle tree to store one or more messages that it wants to sign. A Merkle tree offers two important properties for this task: 1) it can be expressed as just one hash value and 2) it allows to verify that a partial tree (e.g., one signed message) belongs to the whole tree without knowing the complete tree. Hence, a user does not need to know all messages that were signed by the canister in the batch to verify a signature.
Going a step back to how this is connected to the II, the II canister puts one or more delegations into a Merkle tree and lets the subnet sign the Merkle tree hash in its next block as part of the subnet’s state, being itself represented as a Merkle tree. Thus, the delegation that a user sends in each request carries a subnet’s BLS signature that needs to be verified by the replica to make sure that the session key is valid for the user. This makes the verification of the user request more expensive, since the BLS signature can cost up to 2 orders of magnitude more to verify than other standard signature schemes, such as Ed25519 and ECDSA.
However, the subnet’s BLS signature is not the only BLS signature that needs to be verified for the II-based authentication. The root of trust of the IC is the threshold BLS key of the NNS subnet. This means users only need to know this single public key to verify any information coming from the IC, e.g. they don’t need to know each of the subnets’ public keys. To make this possible, the NNS subnet certifies which subnets exist in the IC, what is their public key, and which canister ID ranges they can host. We call this a “subnet delegation” and it consists again of a BLS threshold signature. Such subnet delegation is also included in the delegation obtained by the users when they authenticate with II.
In summary, to verify a session key of a user authenticated via the II, a replica needs to verify 2 BLS signatures.
Optimization: Cache for Canister Signatures
Although a replica needs to verify many BLS signatures for verifying the authenticity of the incoming user requests, a user reuses the same delegation for the whole duration of their session. This means that the same BLS signatures are eventually verified multiple times during the user’s session. Furthermore, the same subnet delegation may be shared by multiple users and the II canister can certify multiple session keys in one batch, which is signed by only one BLS signature. The fact that we are often verifying the same BLS signatures suggested that caching those verifications would improve the performance significantly.
In total, the II subnet produces ~45 BLS signatures per minute and there is one persistent subnet delegation. A session for a session key usually lasts 30 minutes, resulting in at most 45 * 30 + 1 = 1351 distinct signatures per 30 minutes. But even If a session, for some reason, would be set up to last a day, we would require sufficient space in the cache for 1350 * 2 * 24 + 1 = 64801 distinct signatures.
So we decided to introduce a cache for BLS signatures of size 100k to fit, even in the worst case, all signatures actively used in delegations. Our implementation keeps the most recently used entries and stores only a hash of the inputs (signature, message, public key), requiring 65 bytes per element. With this, the fully occupied cache would require only 6.5 MB of RAM. Our implementation allocates a static cache for BLS signatures low in the API stack of the crypto component, limiting the refactoring required to introduce the cache. The runtime improvement of a cache lookup compared to the BLS signature verification is dramatic, the latter is faster by a factor of 4000 (~0.5us vs. ~2ms, respectively).
We benchmarked the canister signature verification in a benchmark environment that is close to the real setup in production and got the following graphs and numbers.
Without caching:
With caching:
In the figures, the y axis shows the total number of verified signatures, whereas the color corresponds to the verification run time, e.g., the verifications colored in yellow took from 0.0001 to 0.0002 seconds. The proportion of a color in a bar corresponds to the number of verifications (each bar is effectively a pie chart).
Two orders of magnitude improvement (~10ms before vs ~0.1ms now) on our testnet and a much higher processing rate!
We also benchmarked if we can now handle the maximum number of requests per second that we have seen during the SNS-1, i.e., 300 requests per second. In the graph below, the middle subgraph uses the users’ real identities to sign requests using Ed25519. This scheme has the fastest verification among the signature algorithms currently supported by the IC and it requires ~50us for a verification and thus is our bottom-line for what we approximately want to achieve.
In the figures, the dropped requests are in red and the accepted requests are in green.
As can be seen from the graphs, we did not observe any significant difference between both graphs on the right in terms of the number of dropped user requests. So we can conclude that our minimum performance requirements are already satisfied with caching only and we do not require further optimizations for canister signature verification right now.
Beyond the 300 requests per second, we further analyzed how many user requests containing two BLS signatures one subnet can currently process with caching enabled. The result is ~500 requests per second with very few requests being dropped which is much higher than the largest peaks in the SNS-1 and is an improvement by a factor of ~11 compared to what we could process during the SNS-1.
Further Optimizations
Although caching is fully sufficient for our current needs, we also evaluated other possible optimizations for the case that the need will arise in the future. We evaluated two further optimizations.
- Batch multiple verifications which simplifies the verification mathematically.
- Use multiple threads to parallelize verification of multiple signatures.
We have implemented a proof of concept for batching and parallelization, which showed further promising improvements. For a batch of 128 signatures, the batching improves the verification of BLS signatures by a factor of 1.48. For a batch of 128 signatures with the same signed message, the batching is 20.9x faster, and for the same public key, it is 34.8x faster. We also applied parallelization on top of batching and found out that, depending on the batch size, it improves the efficiency by further 3x–7x with 8 threads.
So we can say with certainty that we can further significantly improve our throughput of BLS signature verifications if the need arises.
Real Swap Results
After all the work on optimizations to tackle the bottlenecks encountered in the SNS-1, on Mar 3 the OpenChat swap took place. Anyone who has participated in the swap can tell from their experience that the swap went extremely smoothly. We have observed a roughly similar level of activity compared with the SNS-1. However, note that the activity numbers for both swaps are not straightforward to compare after the many optimizations by different DFINITY teams to several parts of the IC stack. What we can tell for sure though is that from the perspective of the IC’s “health” metrics such as the finalization rate or the number of dropped requests, the OC swap was no different from the normal IC operation with a tiny number of failures.
Have any questions? Please ask below!