IC topology series - Node diversification part I

TL;DR

The Internet Computer (IC) operates on physical node machines, commonly referred to as “nodes”. To achieve robust decentralization, these nodes are operated by different node providers, in multiple data centers, by different data center providers, and in many countries. At present:

  • Decentralization is well-achieved in “data center” and “data center provider” characteristics.
  • While “node provider” decentralization is nearing optimal, some providers operate too many nodes relative to available subnets.
  • There is an overrepresentation of nodes in the US and Switzerland when considering the “country” characteristic.

This post offers a preliminary assessment of node decentralization focusing on individual node characteristics. Subsequent posts will explore more comprehensive metrics and models for assessing node decentralization, along with suggested strategies for future node onboarding and offboarding decisions.

Problem statement & goal

The IC runs on physical node machines. By engaging with numerous, independent node providers, the IC ensures robust decentralization. Such an approach mitigates risks associated with single points of failure, centralized control, and potential censorship.

While a broad diversification of nodes enhances decentralization, it is vital to recognize the associated costs. Node providers face both investment costs for hardware procurement and operational costs related to operating their node machines (in particular rental of data center rackspace and bandwidth capacity) and ongoing maintenance (in particular replacement of failed hardware parts). In return for their contributions, the IC compensates them with node provider rewards, which are minted as ICP tokens every month.

In order to strike the right balance between network size & cost and the degree of decentralization, the IC needs to tackle two aspects

  1. Establishing a Target Topology: This involves defining the number of subnets and their respective sizes, aligning with anticipated future demand.
  2. Optimization: Given a target topology, optimize between node rewards (onboarding of additional new nodes and rewards for existing nodes) and decentralization.

In this blog series, we will delve deeper into the optimization aspect, addressing the following key questions along the way:

  • Current Status: Where do we currently stand in terms of node decentralization?
  • Metrics: Which metrics effectively capture decentralization across various characteristics?
  • Models: Develop models optimizing between node rewards and decentralization.
  • Operational strategy: With an established topology and set decentralization goals, how can we apply these models for future node onboarding and offboarding decisions?

In this first post, we will provide background knowledge and offer an initial evaluation of node decentralization. Our subsequent post will cover a detailed exploration of the metrics and models vital to understanding node decentralization.

Background

Nodes

The Internet Computer runs on physical node machines, the foundational building blocks of the IC. These machines execute the protocol and manage user data and computation.

These node machines are high specification servers, standardized for optimal performance. They are distributed across independent data centers worldwide. Currently, the IC operates on more than 1200 physical nodes.

Node providers

A node provider is an entity that operates nodes within the IC network. Node providers are responsible for the physical hardware, connectivity, and overall maintenance of the nodes they operate.

To safeguard the network’s decentralization, the onboarding of node providers is regulated by the NNS, the network’s governing DAO. To become a node provider, one must submit a proposal accompanied with a self-declaration document that states provision of node machines and proof of identity. Subsequently, the community evaluates and votes on the onboarding proposal.

The node matrix below presents a snapshot of all nodes contributing to the IC as of September 7th, 23, grouped row-wise by their node provider, in ascending order from top to the bottom. For providers with over 50 nodes, only the first 50 are displayed, with the actual count indicated to the right of the bar. Nodes that are maintained by specific service providers (henceforth referred to as “Aggregators”) and not directly by the node providers, are counted under rows specific to each Aggregator.

Node provider rewards

Node providers are incentivized to operate nodes through economic rewards. Specifically, they earn ICP tokens in return for their participation in the network.

Given that node providers incur costs in fiat currencies (both for investment and operations), the reward structure is pegged to a basket of fiat currencies called XDR (Special Drawing Rights). This works as follows:

  • Rewards are initially determined in terms of XDR derived from three primary factors:
    • The generation of hardware (Gen 1 or Gen 2).
    • The geographic location of the node and corresponding operating costs.
    • The total number of nodes being operated by the provider.
  • Once the XDR-based rewards are determined, they are converted to ICP tokens using a 30-day average ICP/XDR exchange rate.
  • These rewards are then disbursed to node providers on a monthly basis.

For a more comprehensive breakdown on node provider rewards, see here.

Subnets

A subnet is a collection of nodes that run their own instance of the consensus algorithm to produce a subnet blockchain that interacts with other subnets of the IC. Subnets play a pivotal role in striking the balance between scalability and security:

  • Scalability: Not all nodes can accommodate every application.
  • Security: Applications must run on enough nodes to guarantee data integrity and ensure uninterrupted uptime, even in the face of individual node failures or malicious activity.

The IC hosts various subnet types, each distinguished by properties (notably, the level of decentralization) making them suitable for specific applications:

  • System subnets: These subnets are reserved for canisters that are an integral part of the IC protocol. Canisters on these subnets do not pay cycles. Only the NNS can deploy canisters on those subnets. Examples: NNS subnet (tdb26), currently consisting of 40 nodes, II subnet (uzr34) currently consisting of 28 nodes.
  • Application subnets: These subnets are user-accessible for canister deployment. Typically, they encompass 13 nodes, and canisters here expend cycles. If users do not specify requirements, the system randomly selects an application subnet for canister creation.
  • Further special subnets types: Beyond the basic system and application subnets, the IC also has specialized subnets tailored for certain dapps. These can offer enhanced features like increased replication. Examples: Fiduciary subnet (pzp6e) is an enlarged application subnet with 28 nodes. The SNS application subnet (x33ed), catering to SNS DAOs, has 34 nodes.

The following subnet matrix shows the number of required node slots of the current subnets on the IC, sorted by size from left to right. The reason for the minimalistic presentation will become clear in the next section. Please note that the subnet matrix could also be generated with respect to a target topology (number of subnets and their respective sizes) instead of the current topology.

Initial assessment of node decentralization

We are now in a position to provide a preliminary evaluation of node decentralization on the IC, focusing on four distinct characteristics: node provider, data center, data center provider, and country.

For optimal decentralization, each subnet would only have unique representations of each characteristic. For example, within any given subnet, each node provider should be represented only once.

To visualize this, consider a matrix, we have termed the “node topology matrix”, where:

  • Tiles in rows represent nodes sharing a certain characteristic (e.g., nodes from the same provider).
  • Crosses in columns signify the required slots within a subnet.
  • A cross overlaying a tile indicates that a node (from that row’s characteristic) is mapped to the subnet denoted by that column.

Under this setup, every node with a specific characteristic can only be mapped once to each subnet. We require that rows are arranged with characteristics in ascending order from top to bottom, while subnet columns are arranged in descending order from left to right.

Tying back to the preceding section, this matrix emerges by superimposing the subnet matrix over the node matrix. This will be illustrated in the subsequent sections.

Decentralization of node providers

The matrix depicted below showcases the node topology for the “node provider” characteristic.

From the matrix, we can infer that achieving near-optimal decentralization is plausible. Notably, only 9 subnet slots, visible as crosses not overlaying tiles, remain vacant in the two most substantial subnets to the left, when mandating a unique node provider representation in each subnet.

Additionally, it becomes clear that several node providers have a disproportionately high number of nodes compared to the total subnets available. For achieving optimal diversity across node providers, providers should not operate more nodes than the number of (planned) subnets. Otherwise, the usage of surplus nodes would inevitably breach the requirement for unique provider representation in each subnet.

Decentralization of data centers

The matrix below illustrates the node topology specific to the “data center” characteristic.

From this visualization, it is evident that, in terms of the “data center” characteristic, the IC has achieved full decentralization. Notably, even the largest subnet, comprising 40 nodes, can be filled accommodating each data center only once.

It is worth pointing out that the crosses in the bottom right, which do not overlay tiles, do not present a decentralization concern. These available slots can readily be populated using unassigned nodes (tiles without crosses) from the upper rows without compromising the decentralization objective.

Decentralization of data center providers

Displayed next is the matrix showing the node topology for the “data center provider” characteristic.

Similar observations as for the “data center” characteristic are applicable here. The IC is already well diversified when considering the data center providers.

Decentralization of countries

The matrix below provides insight into the node topology based on the “country” characteristic.

A stark contrast is evident when compared to previous characteristics.

Currently, nodes are placed across only 15 different countries. Given this limitation, it is infeasible to fill the large subnets on the left while insisting each country is represented only once within a subnet. This challenge is highlighted by the 50+ crosses in the top left that do not overlay tiles.

Even among some of the application subnets with 13 nodes, there is a scarcity of diverse nodes. This shortfall is further emphasized by the 50+ crosses in the middle that again, do not overlay tiles.

To address this, we could moderate our decentralization standards, proposing that a specific country should, at most, be represented twice (instead of once) within a subnet. Implementing this modification would produce the subsequent matrix, where we permit double rows for each country.

The softer target appears more attainable. Only 13 subnet slots in the top left, again illustrated as crosses not overlaying tiles, remain unoccupied under this criterion. The crosses in the middle that do not overlay tiles do not present a decentralization concern. They can be populated using unassigned nodes (tiles without crosses) from the upper rows, without compromising the decentralization objective.

Furthermore, it is clear from that picture that even under this weaker decentralization target, we have far too many nodes in the US and Switzerland.

Preliminary conclusion

Based on the presented analysis, the following summary observations can be drawn regarding decentralization across different node characteristics on the IC:

  1. For the characteristics “data center” and “data center provider,” the nodes on the IC show robust diversification.
  2. Regarding the “node provider” characteristic, we are close to realizing near-optimal decentralization. For optimal diversification, it is essential that providers do not operate more nodes than the total number of (planned) subnets. Presently, a few providers exceed this optimal node count.
  3. Concerning the “country” characteristic, our decentralization aims may need modification. Instead of ensuring each country is represented only once in a subnet, we might consider allowing a country to be represented up to twice. Even under this weaker target, we observe that there is an overrepresentation of nodes from the US and Switzerland.

This blog post has concentrated on assessing individual node characteristics. The subsequent post will introduce a formal method for measuring decentralization and models that evaluate the combined diversification of these characteristics. Additionally, we will provide suggestions for an operational strategy on how these models can be applied to future node onboarding and offboarding decisions.

25 Likes

Great job :kissing_heart: :kissing_heart: :kissing_heart:
You’ve worked hard!

2 Likes

Excellent work putting this together to give everyone a visual of the decentralization effort. Like the idea of the soft approach toward countries as this could always be reassessed at a later date and brought down to one once utilization increases. Looking forward to the next post :+1:

2 Likes

@bjoern I had a thought and just curious if its possible. Since some NP’s have an overabundance of nodes is it possible a process could be put in place where some of these excess nodes could be sold/auctioned to upcoming NP’s? This would let our New NP’s that we need to complete optimal decentralization be onboarded faster and at the same time the number of nodes in the system would not increase as the process is finished.

Utilizing this option may also allow us to go ahead and target 1 country/node rather than the soft target since we would not be adding more nodes to onboard.

3 Likes

Thank you @Yeenoghu, yes I agree. It would be definitely be helpful for decentralization if new node providers buy nodes from those node providers who own a very high number. I assume that interested new & existing node providers could get in touch and review this possibility.

In addition, the overall community would need to agree on a target topology (suggested node count and degree of node decentralization). I will share some thoughts in that regard, in the follow-up post (hopefully by end of this week).

6 Likes

hi bjornek

Double confirm:
Now the node provider remuneration is paid by $ICP, right?
but the currency unit in Node Provider Remuneration - Internet Computer Wiki is USDT/USDC?

Thanks

1 Like

Node provider rewards a determined in units of XDR, but then converted and payed out in ICP (this was always the case).

2 Likes

Thanks. Got it.
Where can I get a detailed and accurate information about XDR?
From what I checked, it is equivalent to USDT/USDC, right?

1 Like

No, it is not equivalent to USD stable coins, but somewhat similar. XDR is also called SDR, and you can read up on it e.g. here

Great and intuitive explanation :slight_smile:

Would be interesting to extend this further and analyze the relationships between the node providers. In theory, two nodes from different providers, in different datacenters and countries seem like perfect candidates to be included in the same subnet. In practice, these nodes might be controlled by two parties which might easily collude.

I’m aware that this might be difficult to quantitively evaluate but I think is ultimately what counts as the other metrics seem good only in theory. Is it a concern in your opinion? Do you have any ideas how to approach this?

Another question related to this topic: are there any plans for all the nodes that are not currently part of any subnet? I guess it’s great to have them there to accommodate future growth but it would be nice to make use of them already. Turning them in API Boundary Nodes controlled by the NNS would help but it seems there are too many idle ones even for that.

Thanks for the great work!

2 Likes

hello there
I just used optimization tool to run out a topology for countries

but it is different from the example in this article.

Does it mean there is no spot for new nodes in HongKong now?

Thanks!

oh, I got two kinds node topology matrix by country

from double row node topology matrix by country, it can be added with new nodes in Hongkong, but ObjectiveValue is 0 after running the command.
We found the data in the two sheets remarked in red as below are different, that is why objectiveValue is 0?

and also in sheet “node_pipeline.csv”, hk data is incomplete as you can see in below screenshot.

not sure what happened.

How to go ahead to verify if we can add new nodes in Hongkong?

Need your quick and great support, Thanks.

Lisa

Hi @Lisa_Crato,

as noted here, the Internet Computer has reached its current target topology. So right now, there is no need to add further nodes.

Happy New Year, Bjoernek

You mean no need to add any new node in any region?

Currently any new node provider/data center/node operator proposal will be rejected?

Thanks!

You might mean in the current node pipleline, no more node machines will be accepted in the existing regions.
how about enhancing the decentralization by adding more potential countries in future pipeline?
Is there any SOP or required docs to propose new country?

Thanks

Yes correct, the IC has reached its current approved target topology and thus for the time being no new nodes are required. This can change in the future (in case that revised targets are specified).

Hi @Lisa_Crato as Bjoern indicates, the IC has reached its target topology so current no new node machines are needed in existing or new countries and data centers, also not in Hong Kong. This may however change in the future and does not mean that you cannot onboard any node machines in the future. With the IC network continuing to grow the community might decide to update the target topology and allow for more node machines to be onboarded.

Thanks for the comment on the node_pipeline file. The data center for web3game as mentioned in the pipeline should be hk4 and not hk2. A PR is already submitted to update this, but when running the optimization tool it does not show any change in ObjectiveValue, i.e. target topology is still reached.

thanks, SvenF
Well noted now.

But I also noticed there is a voice in the community that some DC has too many IC nodes and it can be more decentralized to give more DCs in one region for a chance to enhance decentralization.
Would that be possbile to have a plan recently?

Lisa

Hi @Lisa_Crato, actually with a subnet limit of 1 for data centers and data center operators, there is only one node in one data center for each subnet, so maximum decentralization is reached from that perspective. What can be seen from the second series post however is that there are currently excess nodes available compared to the target topology. With the further growth of the IC network the coming year, plans can be made and discussed with the community how to address this.

Understood.
But as an early builder (though no active node on board), we do want to continue being part of ICP ecosystem.
The further growth of the IC network means application ecosystem or any specific condtidions?
I do want to discuss these topics in the community, but seems not so active because the topics focused by existing node providers and potential node providers are different.
Do you mean I have to run the topology script every month to check?

thanks, Lisa