Hi,
A few months ago, the IC community voted for the replacement of the P2P layer for the state sync protocol. The new P2P layer for state sync uses QUIC as the transport protocol. It improves the performance and the security of the IC stack, and reduces the code complexity. It has been since deployed to all subnets, and is operating flawlessly.
We would now like to propose a new P2P layer used by the consensus protocol, and other consensus-related protocols such as the DKG, Ingress, HTTPS-outcalls, etc.
The new P2P layer we propose for these clients has several major improvements over the existing P2P layer:
- It improves latency for artifact delivery when artifacts are small enough (instead of using adverts, small artifacts are pushed).
- It uses highly concurrent code, integrated with the new QUIC transport that was introduced with the P2P layer for state sync. This enables faster and more efficient processing of messages, and eliminates risks of queuing delays and head-of-line blocking that existed in the existing P2P layer.
The new P2P layer uses a novel data structure we call a slot table, that is used to track active artifacts for each client, and to synchronize the content of the artifact pool of each node with its peers. On the send-side, it maintains this slot table where each artifact is assigned a slot number, out of a bounded pool of slots. The consensus protocol guarantees that it will not need more than a constant number of such slots. Then, for each peer, a set of async tasks are spawned to push the content of the slot table, and any subsequent updates, to that peer. If a peer is slow, it will receive these updates slower. But it will not slow down the sender or other peers.
Each update is sent as a new QUIC stream, and is handled asynchronously by the receiving peer. On the receive-side, peers resolve any conflicting updates using a versioning field and a connection tracking field that are included with each update (commit_id
and connection_id
, respectively). Since the receive side is handled asynchronously, there is no risk of head-of-line blocking in processing incoming updates and artifacts.
The new design of the P2P layer makes it easy to push small artifacts and skip the advert-request-artifact paradigm for such small artifacts. The sender decides whether to push an artifact if it is lower than a certain threshold (currently, 1KB). This is expected to improve the latency of many clients, where most artifacts are small.
We propose to start the rollout of this new P2P layer by enabling it only for HTTPS outcalls, which is a relatively separate client. While we have tested it thoroughly, we believe that starting from a client that is not part of the core consensus protocol, is safer but still useful to be able to monitor it on mainnet and make sure the transition is smooth.
In the coming months, we will propose to migrate more and more clients to the new P2P protocol, until eventually we will deprecate the old P2P protocol and have all clients using the new P2P and Transport layer (thus, the entire P2P layer will no longer use TCP).
As always, let me know if anything is unclear or if you have any further questions.
Thanks,
Yotam