Thanks for the response. I agree. This proposal should be rejected and re-submitted to not set a precedent where we vote on proposals that have misleading summaries, intentional or not.
Before answering your questions, I think it’s worth giving some context on what the notarization delay is, why it is needed, and how we choose a correct value for it.
What is a notarization delay?
When a node receives a block proposal, it verifies the block content and waits for a short duration before notarizing (signing) the block and sending its notarization to all its peers in the subnet. The time a node waits before sending the notarization share is what we call the notarization delay (ND).
For a block to be finalized, a block maker needs to send a block proposal to its peers nodes in the subnet, and at least 2/3 * subnet_size + 1
of the nodes must notarize the message and send its notarization share to the other nodes.
The ND serves as a throttle for the maximum finalization rate of a subnet. I.e. a block can never be finalized faster than the ND, as the ND is the minimum time it takes for nodes to share a notarization of a block when they receive a block proposal. So the finalization rate in a subnet will always be less than 1 block / ND
, in this case, more specifically 1 block / 0.3s
=> 3.33 block/s.
In practice, the block rate is lower than the theoretical maximum, as there are overheads in processing a block, the execution time of canister calls, networking latencies between peers, etc.
Why do we have a notarization delay?
A node can fall behind the rest of the nodes in a subnet, for example, due to networking issues, crashing and restarting, or newly joining a subnet. When a node is behind it needs to catch up with the rest of the subnet to participate in the consensus (block proposals) of the subnet. For a node to catch up, the node typically downloads the state of the subnet at some checkpoint block height and must replay the missing blocks at a higher rate than the subnet finalizing blocks. However, if the node that is behind is unable to replay blocks at a higher rate than the rest of the subnet is finalizing blocks, then the node that is behind will never be able to catch up, as it will always lag behind the rest of the subnet.
To make it possible for nodes to catch up if they are behind, we use the ND as a throttle for the rest of the subnet to slow down the finalization rate, such that the node that is behind can replay blocks at a faster rate than the subnet is finalizing blocks.
How much we throttle a subnet’s finalization rate by (the value of the ND), is independent of the subnet size. The ND is there to ensure that all nodes can participate in consensus and block making of a subnet, and that if a node that falls behind can replay blocks at a faster rate than the subnet finalizes new ones.
What factors do we need to consider for the notarisation delay?
Available networking bandwidth on nodes
A node that is behind needs to be able to download old blocks and replay them at a faster rate than the rest of the subnet. This means that the catching up node replays blocks at a rate of:
block_download_time + block_processing_time < ND + block_processing_time
.
block_download_time < ND
In the worst case scenario, where we have full blocks, each block has a max size of 4MB. A node must also have a minimum of 300 Mb/s available bandwidth to meet the IC spec. Assume 200Mb/s is available for consensus.
block_size / bandwidth < ND
4MB / 200Mb/s < ND
32Mb / 200Mb/s < ND
0.16s
< ND
Thus, from a networking perspective, we need a notarisation delay of at least 160ms in order for a node to catch up in a scenario where we are replaying blocks in a subnet that is under full load.
Overhead per block.
There is also an unavoidable overhead of processing blocks and executing them once they are finalized. Once a block with messages is notarized, the node will execute the messages in that block and certify it in parallel to making new blocks.The high level idea is that if the execution and certification steps take too long, then the bottleneck for the block rate will be determined by execution and certification.
This is something we have observed in busier subnets, such as the OpenChat subnets, meaning we need a notarisation delay, or a block rate throttler that ensure the nodes that are ahead spend longer time making and notarizing blocks than it takes to execute and certify the blocks for the nodes that are behind.
From the production metrics we have about these overheads, we have deduced that a 300ms ND is enough.
What testing and experiments have been conducted?
For all of our experiments, we have simulated Round Trip Times (RTT), bandwidth, and packet loss, to simulate the network behavior of nodes that we see on the Internet Identity subnet. Checkout simulate_network.rs in the IC repo for the source code on how we set these simulations with the transmission control (tc) linux utility.
We have mainly stress tested the simulated subnet by flooding it with a large number of ingress messages, installing many canisters to increase load, and killing nodes to verify that nodes can catch up when joining the subnet.
Our tests show that the change is perfectly fine regardless of the subnet type and subnet size.