Trustworthy Node Metrics for useful work

Greetings Internet Computer Community and Node Providers :globe_with_meridians:

We hope this message finds you well. We would like to share a proposal aimed at providing trustworthy metrics directly on the Internet Computer. Your feedback is invaluable as we consider implementing these changes.

Proposal Overview🎯

The core idea is to introduce Trustworthy Node Metrics that provide greater visibility into node performance. These metrics could potentially be used to influence node rewards in the future.

Architectural Changes :hammer_and_wrench:

Consensus Layer:

  • The Consensus layer will now expose information on which nodes have succeeded or failed in the role of block makers. This data will be sent to the Message Routing layer via an extension to the existing Batch data structure. Notably, no special handling is required for empty blocks.

Message Routing Layer:

  • The MR layer will integrate this information into the replicated state. Specifically, it will map each node ID to a counter of successes and failures over the last 60 days. A double-ended queue will be used for efficient storage. Importantly, the data will be queryable within specific date ranges.

Management Canister:

  • A new function will be introduced for fetching these metrics. This function will allow users to pull data from the replicated state across different subnets. Moreover, users will have the ability to specify the date range for the metrics they wish to query.

A draft specification for these changes is available for review here.

Additional Points :memo:

  • A future feature could adjust node rewards based on these metrics, such that misbehaving nodes get penalties. This feature, however, will not cause any changes in the node rewards.
  • Details on changes in the replicated state will be clarified during the implementation phase.
  • Open-source (CLI) tools will be developed to facilitate in-depth analysis of these metrics.

Fees for Metrics Retrieval :money_with_wings:

In order to prevent DOS attacks, a nominal fee will be charged for each metrics retrieval operation, which will be processed through the wallet canister. We would like to hear your thoughts on this matter.

Beside the CLI tools, follow-up work will expose these metrics on the Public Dashboard and on a canister, which will be open to the public for querying.

Data Storage Consideration :thinking:

  • The addition of these metrics will require approximately 50 KB of extra storage per subnet, which is considered to be minimal.

Your Feedback is Welcome :mega:

We are particularly interested in your thoughts on the proposed architectural changes and the proposed charges for metrics retrieval.

Next Steps :clapper:

Once we’ve gathered your feedback and received approval, we’ll start implementing these changes.


Bookmarked and will come back to this after I’m done with homework lol but any tools to help empower users to understand metrics are appreciated

I wrote and published docs for this feature:


There is even an example for a Jupyter notebook in which the metrics are analyzed:


Hi everyone

In the past few months we’ve been working on the implementation of the functionality outlined above. DFINITY teams from almost all layers (consensus, message routing, execution, DRE, security, SDK) have been involved in this project. We’re very happy to announce that we’re very close to completing this effort.

The management canister now offers a node_metrics_history endpoint (see Interface Spec entry). In the DRE tooling linked to in the post by Sasha, you can find examples of how to use this new feature.
More cdk and agent support will be added in the next few weeks.

The functionality is described in more detail in the following blog post.

Should you have any questions or feedback (do you understand the metrics, does the history endpoint make sense, which other metrics could be interesting), let us know!
If there are many questions, we can also organize an interactive session or walkthrough.