Problem statement
All replicas on a subnet gossip message to each other as part of creating the blockchain and reaching consensus. This becomes more work the larger the subnet is. We want to support much larger subnets than the internet computer uses today, and as part of experimenting with that, we observed that this subnet gossip becomes a bottleneck.
The current gossip mechanism works as follows for a subnet of size N:
- Node A may have a message it wants to gossip to all its peers. It advertises the message to all the (N -1 ) peers in the subnet, which download the advertised message from Node_A. This requires O(N) messages.
- After downloading and validating, the peers in turn advertise the message to all their (N - 1) peers. This adds another O(N * N) messages. Thus an increase in subnet size results in a quadratic growth of message volume.
Proposed change
This second step ensures that all replicas will obtain all valid messages. Currently, this gossip mechanism is used for all different types of messages, e.g., consensus blocks, notarization signatures, and ingress messages. However, this reliability of the broadcast is not equally important for the different artifact types.
- For consensus blocks, it is very important that all replicas see the same blocks, and the “re-broadcast” of step 2 is required.
- For ingress messages, however, any gossip is purely an optimization: it ensures that many replicas have the ingress message, such that whichever replica is the next block maker can include the ingress message, ensuring low latency.
Since ingress messages account for the majority of the message on a subnet under load, and there is no direct security reason to reliably gossip these artifacts, we propose to change how gossip works for ingress messages. More precisely, we propose to omit step 2 for ingress messages. This means that whichever replica receives a user’s ingress message first will advertise it to all peers, but the other peers will not re-broadcast upon receiving the ingress message.
Performance gains
The following charts compare the key metrics under load, before and after the change. These benchmark load tests were performed on a 56 node subnet, with the update rate of 300 requests per second (RPS). The target finalization rate for this subnet (under no load) was 0.35
- Finalization rate, achieved RPS
Baseline | With Improvements | |
---|---|---|
Finalization Rate | Drops from 0.35 to 0.26 | Drops from 0.35 to 0.32 |
Achieved rate (RPS) | 250 / 300 | 295 / 300 |
As the table summarizes, we were able to achieve close to the requested 300 RPS with little degradation in the finalization rate. Also, these key metrics can be seen to be choppy before the improvements. After the proposed change, these hold steady with little variance.
- Advert volume
This chart illustrates the issue of quadratic growth in advertisement messages more clearly.
- Before the change: we were exchanging about 18K adverts/sec, under the requested load of 300 RPS. Bulk of these result from the re-broadcast(step 2) described earlier.
- After the change: we only exchange about 1500 adverts/sec on the same 56 node subnet. This directly results in the performance gains
Status
The proposed changes have been merged, but not yet enabled. We plan to make a series of proposals to enable these improvements on the existing subnets