Hello!
TL;DR: We’re excited to announce that, after the latest HostOS upgrade, the Internet Computer (IC) is officially publishing node metrics to the world. Alongside this development, we have introduced an observability stack designed to help you collect, inspect and analyze these metrics.
This milestone allows the ICP community to overcome a fundamental obstacle that has existed since the inception of the ICP. Previously, it was challenging for node providers and operators to gain insight and effectively diagnose issues affecting nodes in the IC. While DFINITY has in the past proactively monitored the health of ICP nodes, this ultimately centralizes the monitoring process, increasing costs and time-to-resolution for emergent issues.
What’s new: As of late last week, a limited and carefully vetted set of ICP node metrics are now publicly available. These metrics include some information on hardware status, power consumption, networking performance, resource utilization, and the version of the GuestOS running on each node. More metrics will be added in the future, based on the needs of node providers, the insights of stakeholders, and security reviews.
Public node metrics ensure that more nodes stay healthy, that problems get resolved quicker, and they provide transparency on power usage, leading to an overall more stable and high-performing Internet Computer Protocol network.
It’s clear to us that mere metrics availability isn’t enough. To that effect, we are also releasing a comprehensive self-service stack that anyone can run, to collect, analyze, and derive insights from these metrics. The stack allows people interested in the ICP to analyze node performance, node providers to monitor the health of their investment, and operators to detect malfunctions via custom alerts.
How we did it: We created and bundled a secure, minimal component in the IC-OS called metrics-proxy (more below), which filters and shares essential metrics without compromising security. Additionally, we made an open-source tool, the IC observability stack, available for anyone to use and analyze these metrics.
Why is this important for the IC Community: This new development ensures that more nodes remain healthy, facilitates detection and resolution of incidents, and also allows the public to directly access data on IC’s power consumption and other information of public interest. By making these metrics accessible, we aim to foster a more transparent, efficient, performant, and robust ICP network for all users and providers.
How everything plays together:
-
We developed a custom IC OS component called metrics-proxy. This minimal, audited component filters metrics and reduces resolution to prevent timing attacks, ensuring critical ICP behavior remains secure.
-
We included this component in our IC OS releases, configured in such a way that timing attacks cannot be used to leverage sensitive information out of the ICP. Firewall rules permit limited access to the component by the public, also mitigating the risk of attackers exclusively hogging access to the service.
-
We also created and shared a separate software project, named the IC observability stack, to help users, operators and community members scrape and query public IC metrics; the stack contains intelligence to select nodes and query these metrics endpoints. With this software, it’s easy for anyone to bring up your own IC observability stack locally, and therefore to keep an eye on the IC nodes’ health.
A typical node exporter dashboard view from the IC observability stack deployment
Cumulative power usage of all nodes provided by DFINITY
Where to obtain the metrics: The metrics are directly available on the public IPv6 address of each up-to-date HostOS node through HTTPS port 42372 at the following paths:
-
/metrics/hostos_node_exporter
for node exporter metrics -
/metrics/guestos_replica
for the GuestOS version running on the node
The most straightforward way to obtain, aggregate and visualize metrics is through the IC observability stack. The stack eases complications such as having to find out IC node IP addresses, TCP ports and URL paths.
However, manually obtaining metrics of a node can also be done from a browser:
-
Get the IP address of a node from the public dashboard (e.g.).
-
Change the 5th octet from
6801
to6800
to get the HostOS address. The address shown in the public dashboard is GuestOS. For instance,
2602:294:0:a05:6801:83ff:fee2:414
should be changed into
2602:294:0:a05:6800:83ff:fee2:414
-
Prepare the URL to paste it in your browser: https://[2602:294:0:a05:6800:83ff:fee2:414b]:42372/metrics/hostos_node_exporter
You will have to accept an invalid certificate error when accessing it through your browser.
Here is a Github gist that illustrates fetching and processing of the latest metric values from a Python script: https://gist.github.com/sasa-tomic/04d9217e5b8847caa0566b619495bb02
Keep in mind that in all cases you need to have IPv6 address and IPv6 connectivity on your computer, to fetch metrics directly from the nodes.
Coda
Thank you for your continued constructive feedback and support. Your contributions are essential for the ICP success!
For more detailed information, technical documentation, and to access the IC observability stack, please visit our stack’s Github repository.