TL;DR
The Internet Computer Protocol uses Trustworthy Metrics to evaluate node performance based on block creation success. It is proposed to penalize underperforming nodes by linking their reward to the Trustworthy Metrics, focusing on nodes with very high failure rates. It is suggested to roll out this feature in phases, beginning with a Proof of Concept to collect feedback from stakeholders, before the actual implementation of the reward reduction within the Network Nervous System.
Background & Goal
Earlier this year, the ICP protocol introduced “Trustworth Metrics for Useful Work” to enhance transparency in node performance. It exposes information on which nodes have succeeded or failed in the role of block makers. For more technical details on the design and tooling, please see here.
Building on this foundational work, it is suggested to link node rewards with node performance, with the following aims:
- Node rewards should be dependent on the node performance:
- Healthy nodes, being available almost always, should get full rewards.
- Under-performing nodes should be penalized.
- Avoid side effects on incentives (misreporting to harm other nodes):
- It should be difficult for other nodes to influence node rewards for a particular node. A node should not directly profit from downtime of other nodes.
Trustworthy Metrics
To measure node performance, the Trustworthy Metrics evaluate how often a node contributes to protocol operations. This works as follows: In each protocol round, a node is selected to be the block maker based on the random beacon. For a node to successfully make a block, it must be up-to-date and fast enough, necessitating robust network connectivity along with adequate computing and storage resources. The Trustworthy Metrics counts the number of blocks a node was tasked to make versus how many it successfully made, providing a reliable indicator of a node’s contributions to the protocol.
The metrics are deemed trustworthy because they are generated and served directly by the Internet Computer, without any intermediaries. This design prevents any single node from misrepresenting its operational status.
The block maker success rate over a specific period (e.g. a month) is defined as the ratio of the number of blocks a node successfully made to the number it was supposed to make. Conversely, the block maker failure rate is calculated as 1 - block maker success rate.
Initial Analysis of Node Performance Metrics
In this section, we conduct an initial analysis of the Trustworthy Metrics, which have been collected from February 20 to April 18, 2024. Our aim is to explore potential regional differences in node performance that could stem from variations in network connectivity.
The graph below represents the average failure rate of nodes grouped by their respective data centers, sorted in descending order of performance. Note that not all data centers with low failure rates are shown.
From the graph, it is evident that most data centers exhibit very low failure rates and thus high reliability. However, a few data centers have significantly higher failure rates. Interestingly, there appears to be no consistent pattern linking failure rates to geographic locations, as data centers within the same city can perform quite differently.
Suggested Approach for Node Reward Reduction
The overall idea is to define a simple and transparent approach, linking node reward reductions to the Trustworthy Metrics in a straightforward way. Considering that the vast majority of nodes exhibit maximum failure rates between 0-5%, our focus will initially be on outlier nodes with failure rates exceeding 10%. This threshold may be tightened in the future
In more detail, this would work as follows: For every reward period of one month, the NNS determines for every node the average block maker failure rate during that period. Based on this, a performance reduction score from 0 to 100% will be applied on the monthly rewards of that node. The reduction score could be determined by a piecewise linear function connecting configurable red points, as shown in the below graph.
Example calculation:
- A node with a failure rate up to 10% will incur no penalty.
- A node with a failure rate of 60% will receive a 80% reduction in rewards.
Trustworthy Metrics can only be collected for nodes that are assigned to subnets. As a result, it is only possible to calculate the reduction score for nodes when they are part of a subnet. In the next step (still as part of this proposal), it needs to be defined how to handle rewards of nodes which are not assigned to a subnet.
Next steps
It is suggested to follow a phased approach to integrate the new node reward reduction approach:
Phase 1: Proof of Concept, Refinement and Calibration
- The objective of this initial phase is to make all stakeholders familiar with the methodology and data underpinning the proposed approach.
- To facilitate this understanding, a Proof of Concept has been implemented by @pietrodimarco, accessible [here]. This tool allows stakeholders to analyze the Trustworthy Metrics of nodes over time and observe the resulting node reward reduction scores.
- We encourage community members to actively engage with this tool and provide feedback on its functionality.
- During this phase, node rewards will not yet be impacted.
- Towards the end of this phase, we plan to submit a motion proposal to the Network Nervous System to secure agreement on the approach.
Phase 2: Reward Reduction Implementation
In the subsequent phase, the focus will shift to implementing the automatic adjustment of rewards based on node performance directly within the NNS. The specifics of this implementation will be detailed in an additional design step.