Increasing DFINITY Node Count and NNS Topology Exception

In recent motion proposals it was agreed that DFINITY’s nodes should be treated equally to those of other Node Providers. In line with that decision, DFINITY reduced the number of data centers where it operates nodes to Stockholm (SH1) and Zürich (ZH2) and subsequently sold the nodes from BO1 and MR1 to new Node Providers (see this discussion).

However, some important operational challenges and likely future requirements were overlooked during this process. Currently, DFINITY operates a total of 42 nodes spread over 37 IC subnets. The current subnet recovery process expects DFINITY to have one node per subnet during recovery. While an additional node can be added during recovery, doing so would significantly extend recovery time and increase operational complexity. In recognition of these challenges—and given the exceptional importance of swift recovery—an extra three nodes are dedicated to the NNS subnet for recovery resilience.

This configuration effectively uses 39 nodes, leaving only 3 spares for all operational and redundancy needs. With one node in Stockholm currently showing degradation (and thus not being immediately usable), we effectively have only 2 healthy spare nodes. Moreover, the ongoing redeployment of the Zürich (ZH2) nodes using the HSM-less process requires that these nodes be removed from an active subnet and replaced by spare ones, further straining our spare capacity.

Additionally, as outlined in motion proposal 133841, there are plans to add up to 20 more application subnets depending on the IC growth. This expansion will not be feasible without an increased number of spare nodes.

To address these issues, we propose:

  1. Allowing DFINITY to operate more than the current 42 nodes—in particular, we would like to start a discussion about expanding the node allowance of DFINITY by 14 to 28 more nodes. Considering the motion proposal 133841, we believe it would be prudent to allow 28 more nodes which would leave us with approximately 10 nodes for maintenance. If community prefers a lower number instead, we would also be able to manage with 14 nodes at least in near term, and then we can re-discuss further expansions when needed in the future.
  2. Allowing an exceptional case for the NNS subnet whereby DFINITY may have 3 nodes that do not need to conform to the standard topology restrictions (typically limiting to 1 node per data center owner, per data center, and node provider). This exception has raised several questions in community discussions (see this thread) and would benefit from a clear, official reference for subnet membership change proposals for the NNS subnet.

I invite all community members to discuss these points and share feedback on the proposed adjustments. Your input is essential to ensuring that our operational capacity meets both current and future network requirements.

7 Likes

Related forum discussion:

1 Like

I have no concerns about the proposed changes to the additional nodes. I would prefer that DFINITY be allowed the additional 28 node allocation so we don’t have to micromanage this decision in the future and so there are plenty of nodes for maintenance. I also have no concerns about making a topology exception for DFINITY. In my opinion, these deviations from the NNS approved node count and topology would be in the long term best interest of the internet computer.

If the community can have any input on the remuneration, then I would like to see the extra remuneration DFINITY would receive from these 28 extra nodes to be allocated directly to funding more known neurons to provide technical reviews in the Grants for Voting Neurons program. This means adding more individuals and/or teams for reviewing the IC-OS Version Election, Protocol Canister Management, Subnet Management, Node Admin, and Participant Management proposal topics. At this time, DFINITY only offers 2 grants per proposal topic. I would love to see 1-2 more teams per proposal topic, or whatever 28 nodes could afford.

@cryptoschindler @marc0olo

10 Likes

Is that cause you need access to the latest CUP to initialize the recovery? Are there any initiatives to eventually lift this dependency on the foundation?

2 Likes

I would fully support that! We need more (funded) community involvement!
cc @katiep and @lara as well, since you are somewhat involved in the community grants.

3 Likes

@Zane Yes (read state from all nodes to compare data and ensure nothing malicious is happening on one, and then write the recovery CUP to one node), and also yes to the second question. The Consensus team is actively looking into ways to improve and automate recovery.
However, recovery is a very tough problem to solve in general and for a general recovery (ANY case that may happen), it’s hard to come up with a fully automated recovery. So improvements will have to come one for particular cases first.

1 Like

The NNS proposal has been submitted. Thanks for the useful feedback in this thread!

https://dashboard.internetcomputer.org/proposal/135700

You may notice in the proposal summary that we propose not getting rewards for the added nodes. This was suggested by the DFINITY leadership in order to maintain fairness to other NPs.
This way DFINITY still gets the same node rewards as all other NPs, and we still get to have more nodes in case they are needed for subnet recoveries.

The way this would be implemented is by using type1 or similar node reward type on these nodes, since type1 nodes already get zero rewards.
It will be possible to check and confirm the reward configuration at any time with dre registry or ic-admin tools. And, of course, one can always keep track of all NNS proposals to confirm that none of them change the reward configuration for these nodes.

3 Likes

I was a bit hesitant about this proposal at first, but I think Dfinity not getting rewards for these additional nodes is the perfect solution. :+1:

5 Likes

Thanks @Sat, sounds good! I’ve voted to adopt.

Hopefully this will make it less likely that we see repeats of proposal 135540. Presumably once CUPs are verifiable, other NPs may be able to take part in subnet recovery (reducing the need for this DFINITY NP business rule)?

I guess this proposal will now serve at the latest IC Target Topology reference. I diffed the tables (current reference being this one)

Subnet Type # Subnets # Nodes in Subnet Total Nodes SEV Subnet Limit (NP, DC, DC Provider) Subnet Limit (Country)
NNS 1 43 43 no 1* (with exception for DFINITY nodes) prior discussion 3
SNS 1 34 34 no 1 3
Fiduciary 1 34 34 no 1 3
Internet Identity 1 34 34 yes 1 3
ECDSA Signing 1 28 28 yes 1 3
ECDSA Backup 1 28 28 yes 1 3
Bitcoin Canister 1 13 13 no 1 2
European Subnet 1 13 13 yes 1 2
Swiss Subnet 1 13 13 yes 1 13
Application Subnet 3151 13 403663 no 1 2
Reserve Nodes Gen1 100
Reserve Nodes Gen2 20
Total 7631023
3 Likes

I think that’s a separate topic. Subnet recovery would have to be executed very quickly and by people who know what they are doing. Getting hold of different node providers could be tough, people are in different time zones and anyway not constantly available. Plus many node providers are not that technically inclined. There was a theoretical exercise about this with a crashed subnet some time ago as I recall. Not sure how long it took for all node providers to execute those recovery instructions (it was not fast whatsoever), or if they even all did it in the end.

2 Likes

I agree this is a sensible and reasonable approach to handling subnet recovery coordination. I support implementing this solution.

1 Like

My understanding is that this is the reason DFINITY always has to submit a proposal to enable DFINITY engineers to SSH into the subnet nodes to facilitate recovery, because no other NP could do it themselves, given that:

  • Many (perhaps) none have the know how
  • Even if they did, their actions wouldn’t be verifiable

In my opinion, these are problems that need solving longer term (there’s already a plan for the second point above). I’d expect that addressing the first point is on the agenda too (longer-term).

2 Likes

Proposal #136084 — Zack | CodeGov

Vote: Adopted
Reason:
In line with the forum post above Increasing DFINITY Node Count and NNS Topology Exception ,
Dfinity adds Evocative Data Centers Seattle as se1 datacenter in North America,US,Washington.
Located on the 2nd floor of KOMO Plaza building according to their website.
Other info.
Proposer is Neuron 77.

About CodeGov

CodeGov has a team of developers who review and vote independently on the following proposal topics: IC-OS Version Election, Protocol Canister Management, Subnet Management, Node Admin, and Participant Management. The CodeGov NNS known neuron is configured to follow our reviewers on these technical topics. We also have a group of Followees who vote independently on the Governance and the SNS & Neuron’s Fund topics. We strive to be a credible and reliable Followee option that votes on every proposal and every proposal topic in the NNS. We also support decentralization of SNS projects such as WaterNeuron, KongSwap, and Alice with a known neuron and credible Followees.

Learn more about CodeGov and its mission at codegov.org.

1 Like

Hi @sat . Could you please give us a bit more context for proposal 136084? The proposal has a link for this whole thread rather than a specific post. I presume it all ties in but if you could add some confirmation within the thread then this should give sufficient information to support the proposal.

1 Like

Sure, absolutely @timk11. So the high-level plan as described in the first post of the thread is that DFINITY adds 28 more nodes, without getting any rewards for them, as described in this recently adopted motion proposal and discussed and supported by the community in the forum thread.

DFINITY already had some Gen1 nodes in Seattle, in a DC that was built with a regular IC usage in mind but was not actually used for the Mainnet since there was no need for more nodes. So these nodes were used for (internal) testnets and testing.
Platform operations and a few other engineering teams are now actively working on moving these services out of the Seattle DC (se1) so that these servers and this DC are fully free and will be fully dedicated to the IC Mainnet.
The next steps will be to submit a proposal for adding a node operator record with allowance of 28 nodes in this DC (without rewards of course). And after that we can onboard nodes and let them join the Mainnet.
I hope this helps. Let me know if you had any additional/particular questions Tim.

Regarding decentralization, I believe this will be an improvement since we (DFINITY) will have DCs also in US, so not only in EU.

7 Likes

Hi @sat, Dfinity actually did have nodes in the US until very recently, in 4 different data centers if I recall correctly, until they were sold in the silent auction. So just curious, does Dfinity have exactly 28 more nodes? With Gen1 specs? Or you have some and have to purchase some more to get to 28? If Dfinity is buying new hardware, is it Gen 2 specs? Or already testing future Gen3 specs?

1 Like

That’s correct @snoopy. For fairness to other node providers DFINITY reduced the numbers of nodes to 42 but then we realized that’s not enough. It was a mistake we made and no one noticed in time. But possibly we would have taken the same path anyway to ensure fairness.

We do have a lot of Gen1 nodes that we use internally for r&d and aren’t hard to repurpose. So these Seattle nodes are completely regular Gen1 nodes, like the ones in the Stockholm DC.

5 Likes

Proposal 136084 – Louise | Aviate Labs
Vote: ADOPT
Review:
The proposal is to add a data center: SE1 - Evocative Data Centers to the network.

  • The coordinates in the payload point to Komo Plaza. After checking the data center’s website, this is the building in which data center is located. Coordinates do point accurately to data center location :white_check_mark:
About Aviate Labs

Aviate Labs is a team dedicated to supporting node providers since 2020. Our mission is to make high-performance infrastructure management on the Internet Computer (ICP) as seamless as possible, while adhering to the principles of decentralization.

We are known for our contributions to the ecosystem, including the go-agent and developer work packages on GitHub, as well as the Node Monitor tool, which alerts Node Providers as soon as any of their nodes go down.

In the NNS, Louise reviews and votes independently on ‘Node Admin’ and ‘Participant Management’ proposals on behalf of the Aviate Labs Neuron.

The Aviate Labs known neuron is configured to follow Louise for these topics and other trusted entities for broader proposals. We strive to be a credible and reliable Followee, committed to voting on every proposal and supporting decentralization within the ICP ecosystem.

Proposal 136084 | Tim - CodeGov

Vote: Adopt

This proposal adds a new data centre in Seattle, US, as per the background given in this post and website details here. The location of this data centre matches the co-ordinates provided in the proposal.

About CodeGov

CodeGov has a team of developers who review and vote independently on the following proposal topics: IC-OS Version Election, Protocol Canister Management, Subnet Management, Node Admin, and Participant Management. The CodeGov NNS known neuron is configured to follow our reviewers on these technical topics. We also have a group of Followees who vote independently on the Governance and the SNS & Neurons’ Fund topics. We strive to be a credible and reliable Followee option that votes on every proposal and every proposal topic in the NNS. We also support decentralisation of SNS projects such as WaterNeuron, KongSwap, and Alice with a known neuron and credible Followees.

Learn more about CodeGov and its mission at codegov.org.

Proposal 136084 – LaCosta | CodeGov

Vote: ADOPT

The proposals adds a new data center se1 in Seattle, North America inline with this post.
Checking the coordinates in Evocative site they match the payload of the proposal.

About CodeGov

CodeGov has a team of developers who review and vote independently on the following proposal topics: IC-OS Version Election, Protocol Canister Management, Subnet Management, Node Admin, and Participant Management. The CodeGov NNS known neuron is configured to follow our reviewers on these technical topics. We also have a group of Followees who vote independently on the Governance and the SNS & Neuron’s Fund topics. We strive to be a credible and reliable Followee option that votes on every proposal and every proposal topic in the NNS. We also support decentralization of SNS projects such as WaterNeuron, KongSwap, and Alice with a known neuron and credible Followees.

Learn more about CodeGov and its mission at codegov.org.