Adjustment of IC Target Topology to Increase Subnet Size of Fiduciary and II subnets

TL;DR

The IC Target Topology sets targets for the number of Gen1 and Gen2 nodes per subnet and the decentralization coefficients (Nakamato coefficients) per subnet. It is a model used in the Internet Computer network to optimize the balance between node rewards and decentralization, and is not fixed and is voted in by the community as the target to be achieved within a certain timeframe. The latest IC Target Topology can be found here.

It is proposed to update the IC target topology in order to increase the subnet size of the Fiduciary subnet and II subnet. The main reason for this is to increase the security of the IC since larger subnets require more node machines/node providers colluding with each other in order to take over these subnets. Given that the IC network currently has sufficient spare node machines, no additional node providers or node machines need to be onboarded. The intention is to gradually increase the size of these Fiduciary subnet and II subnet over time, and at a certain (to be determined) time set up specific production and backup subnets for storing the tECDSA signatures.

In addition to increasing the subnet size of the II and Fiduciary subnets, it is proposed to update the country limit for the swiss subnet to 13, which was erroneously set to 2 in the currently approved IC Target Topology table.

Background

The uzr34 and pzp6e subnets are subnets with 28 active node machines. These subnets have more active node machines than an application subnet (that has 13 active node machines) since the uzr34 and pzp6e support important functionality for the IC:

  • The uzr34 subnet runs the Internet Identity, Cycles Ledger, Exchange Rate canister, and XRC dashboard, and stores the backup tECDSA signing keys.
  • The pzp6e subnet acts as a Fiduciary subnet and stores the tECDSA signing keys.

With 28 nodes, | 28 / 3 | + 1 or 10 nodes need to be malicious in order to take control of these subnets. By adding more node machines to these subnets, more node machines/node providers need to collude to take over the subnet. Hence, adding more node machines to these subnets will add to the security of these subnets.

It is proposed to improve the security of these subnets by adding more node machines. This will mean an update of the IC target topology that was previously approved on 12th November 23. As there are currently more node machines in the IC network than required to meet the decentralization targets set in the target topology, no new node machines will need to be onboarded, and spare node machines can be used to increase the size of the uzr34 and pzp6e subnets.

Roll-out plan

With the currently available nodes, the uzr34 and pzp64 subnets can be increased to 34 node machines and still meet decentralization targets.

For the long term, the following roll-out plan is intended:

  1. Further increase the subnet size of uzr34 and pzp64 with 3 nodes every time when spare nodes are available. The reason for adding nodes in multiples of 3 is that each three nodes will result in one more additional node machine having to act maliciously to take over the subnet.
  2. As a final step in the roll-out plan, separate ECDSA signing and ECDSA backup subnets will be created, provided the load on the subnets requires to have separate subnets for ECDSA, and sufficient spare node machines are available to meet the decentralization coefficients.

Hence, although the target subnet size of uzr34 and pzp64 are increased, this will be planned for gradually. This proposal is not intended to propose that new node providers or new node machines will be onboarded.


Figure 1 - node allocation to subnets after changing subnet size of uzr34 and pzp64 to 31 node machines

Updated Target Topology

The below table shows the proposed changes in the target topology. The columns in grey show the changes in target topology as compared to the previously approved target topology.

*Correction on previously published target topology, as Swiss subnet will only have nodes located in Switzerland

In addition to increasing the subnet size of the II and Fiduciary subnets, it is proposed to update the country limit for the Swiss Subnet to 13, which was erroneously set to 2 in the currently approved IC Target Topology table. Note that there currently is no Swiss subnet yet implemented.

Motion Proposal

As always, feedback from the community is very much appreciated on this proposal. Depending on the feedback, the intention is to formally submit a motion proposal for voting in the course of next week.

12 Likes

This seems like a very important change @SvenF. Thanks for sharing.

My question is related to how much additional work you think it will take for people to review these additional proposals. I know the current IC Target Topology was implemented within the last year and it was a major change. There was a significant increase in the flux of Participant Management, Node Admin, and Subnet Management proposals during the implementation of that change. Presumably that means it took a lot more time for DFINITY to review and approve those proposals.

I’m asking because of the current Grants for Known Neurons program that DFINITY is offering to encourage more active participation in technical reviews and independent voting by community members. The topics covered by this program include Subnet Management, which is likely going to be most heavily impacted by this proposed IC Target Topology change. Based on the current rate of Subnet Management proposals we are seeing, I believe the grant that is offered for this topic is already too low compared to the work required. Hence, if this new IC Target Topology proposal has a significant increase in proposals, then it could put more work on those reviewers. After the romance wears off from being chosen by the NNS to perform these reviews and the successful candidates get into the routine state of the job, I’m worried that they may quickly find themselves regretting the commitment. Do you think this could be an issue? Or do you think there will be a negligible impact on workload to review the proposals that result from this change? I’m wondering if the grant that is offered for this topic should be increased. I’d love to know what you think @SvenF, @lara, and @cryptoschindler.

3 Likes

Hi @wpb, having a the subnet size for two subnets increased will not significantly change the number of proposals for subnet management. After the motion proposal being approved, there will be a proposal to add additional nodes for each of these two subnets. And just like for any subnet, if there is a node machine that needs to be swapped (for being unhealthy or optimizing decentralization) there will be another proposal, but as you have probably seen these do not happen very often as node machines are pretty reliable. You can see this from the number of subnet proposals every week.

4 Likes

Thanks for this announcement @SvenF, and for the clear explanation.

This sounds good, avoiding the need for downtime of important services when there’s a need to reshare keys between subnets (e.g. Generation of Schnorr production keys and II subnet downtime - Developers - Internet Computer Developer Forum (dfinity.org)).

Are there any plans for what other responsibilities these new subnets may have (presumably the backup subnet won’t house any other critical services)?

What sort of capacity utilisation does this account for? i.e. It’s possible to meet decentralisation targets under ideal circumstances, but presumably there will continue to be a need for a level of redundancy (which allow decentralisation targets to be met even when a certain percentage of allocated nodes go down or become degraded). Or is the idea that nodes can just be reclaimed from the larger subnets if needed (for use by a struggling subnet), rather than sitting idle in the meantime as they often are now?

2 Likes

Looks great, and makes total sense.

What do you think the max number of nodes for these high security subnets could be? Potentially 100 or more in the future?

4 Likes

Tahnks @lorimer, there are no other plans yet but as these subnets are intended to be more secure by having more nodes, I could imagine there would be plans in the future to store additional keys or run other critical dapps from these subnets.

What I intended to say was that there still are sufficient spare nodes - so nodes that are not in a subnet yet - that can be added to increase the II and Fiduciary subnet. The IC currently has more spare nodes than was calculated in the Target Topology, i.e. more spare nodes than the 120 spare nodes that are shown in the table above. But the idea is not to move nodes from for example an active application subnet to a struggling subnet; the spare nodes are intended to be swapped into any subnet if it has one or more dead node machines.

1 Like

Hi @dfisher, the target for the future is 41 node machines, but there are some limits to the registry size that have to be taken into account.

2 Likes

Thanks for this info @SvenF

Would there also be plans to implement a key resharing mechanism that doesn’t require downtime (halting the subnet)? It seems unfortunate that critical services need to be halted for a number of minutes while keys are being reshared if they reside on the key backup subnet. Why not reserve the backup subnet for non-critical services (other than holding backup keys)?

What does the SEV column indicate? That SEV is planned for a subnet?

If so, I’m concerned. Shouldn’t all subnets eventually be SEV? I’m especially concerned at the moment for the threshold key subnets.

2 Likes

Hi @lastmjs, the SEV column indicates whether the nodes in that particular subnet need to be SEV enabled. Currently, the Gen2 node machines support SEV but the Gen1 node machines do not support this. Hence the most criticial subnets (e.g. treshhold key subnets) are marked as SEV; these subnets should only contain Gen2 node machines. The other subnets not marked as SEV may contain either Gen1 node machines or Gen2 node machines. For the long term, it is the intention that all nodes will be SEV nodes, but this will require gradual replacement of the Gen1 node machines by a future new node machines specification.

3 Likes

Thanks for the explanation. I was concerned because the fiduciary subnet is marked as not having SEV, which it says is the subnet used for threshold signing today.

1 Like

A Nakamoto coefficient of 3 is required for a 13 node subnet in order to meet the requirements of the target topology (and higher for larger subnets).

Please let me know what I’m missing :slight_smile: I hope you can understand my confusion.

I am not aware of concrete plans yet on this. As these operations seems to be very infrequent, the impact may not be significant. In my opinion it is more important to make progress with the verifiability of the resharing proposals, and try to reduce the downtime caused by the resharing.

If I read the topology adjustement correctly, the suggestion is to eventually have 2 new signing/backup subnets. In this specific case the keys could be reshared directly during the creation of those subnets, so that no downtime would be needed to back them up again.

Why not reserve the backup subnet for non-critical services (other than holding backup keys)?

I think this could be a possibility going forward.

2 Likes

All, just a note that the motion proposal is submitted, you can find it here. Please feel free to vote on the proposal and, of course, post any feedback you might have in this forum thread.

3 Likes

I’d like to raise concerns about this motion proposal, and explain the reason why I’ve rejected it.

The IC is currently in a state where it’s not meeting the existing Target IC Topology (and hasn’t been meeting it for a while). This proposal makes the target topology even more stringent and harder to achieve. I can’t wrap my head around why this has been proposed without first demonstrating that the existing target can be met (it’s currently not).

Here’s one of numerous ‘Change Subnet Membership’ reviews that I’ve just posted that highlights the existing problem →

These sorts of concerns had already been raised before this motion proposal was submitted. The discussion can be found on this thread → Subnet Management - 4zbus (Application) - Developers - Internet Computer Developer Forum (dfinity.org)


Before a motion proposal like this should be capable of passing, it needs to be demonstrated that the IC can achieve it’s current target (it hasn’t been achieving it for quite a while). That’s my stance (respectfully :slightly_smiling_face:)

1 Like

Hey @Lorimer assuming folks agree this is a problem that needs to be addressed asap, what do you think is a reasonable timeframe for aligning the node distribution and subnets with the Target IC Topology? It can’t happen over night, right?

I seem to recall that the current Target IC Topology was approved mid to late last year and it took about half a year to reorganize all the nodes and subnets according to the new topology. If you are concerned about major misalignment with the approved topology today, then surely it will take some time to reorganize again.

Also, if the title is Target IC Topology, then does the language of the approved topology leave room for deviation from the target? In your opinion, are there enough nodes in standby that it would be possible to fully align with the approved topology?

I certainly want to learn more from DFINITY on these issues, but I’m also curious if you have given any thought and analysis to the big picture for how the nodes and subnets could be configured to comply with the target topology. Is it sufficient to evaluate individual proposals and whether they improve, make worse, or have a neutral effect on the topology? Or does the big picture make the practical execution of the topology more challenging?

1 Like

In my opinion this proposal is pointless if it’s not achievable (and would also be wrong/misleading regarding some of it’s statements). From my perspective, one of two things appears to be true:

  • The current target topology is unachievable (and therefore so is the proposed, more ambitious one), or
  • The community has been asleep at the wheel proposing and passing proposals that have allowed subnets to deviate further and further from their target topology

I would expect that the current target topology probably is achievable, but can’t be sure. Ultimately, the fact that the IC is far away from it’s target on some subnets (such as the NNS) indicates that there’s a problem that needs solving first before it makes any sense to propose an even more ambitious topology.

I’d planned to take a look at this today, but there was a flood of distracting proposals that I reviewed instead. Based on my understanding of the existing tooling, I wouldn’t expect that this can be solved overnight and would take some time to plan and execute.

This sort of problem is known and an NP-complete problem. Solving this challenge for a single subnet is relatively easy (my understanding is that this is what the existing tooling does - it’s what it means to have a greedy approach). As soon as you take into consideration more than one subnet, each solution for one subnet limits the options for another. As the number of subnets grows, so does the complexity of the problem (exponentially). Relatively optimal solutions for these sorts of problems are generally found heuristically, by taking a ‘big picture’ approach and trying out numerous global solutions (for all subnets at once) while using a range of techniques to balance exploitation and exploration.

Attempting to iteratively improve the topology step by step, bit by bit, subnet by subnet, is liable to falling into what are described as local optima. These are situations where it’s very hard to find a way to improve the solution in one or two steps (and instead requires numerous steps to be taken all at once to escape that local optima).

I have experience with (and a keen interest in) these sorts of problems, and would be happy to work on a tool for tackling the topology optimisation problem if the foundation and/or wider community would find this helpful. It’s not something I could churn out overnight though, and would take a fair amount of work to produce a robust tool that’s performant, maintainable, and flexible (suitable for solving for arbitrary targets that may be proposed in the future).

1 Like

The point could be that the topology is the ideal that we should work toward, but with the understanding that there may be limitations. For example, there is a weighting system applied to the node provider rewards that is based on geographic location. The locations where there is an over abundance of node providers and/or nodes get paid with a lower weighting factor and the locations where there is a scarcity of node providers and/or nodes get paid with a much higher weighting factor. This is intended to incentivize more diversity of node and node provider locations. If the existing topology deviates from the ideal, then there are likely practical reasons for the deviation. Hence, the target topology could be the intended design, but there may be limitations on available nodes in suitable locations. I’m trying to understand if this misalignment with the target topology is a black and white situation in which all proposals that don’t align with the approved topology need to be rejected, or if there is more to consider regarding the big picture.

I wouldn’t say that the community has been asleep at the wheel. The community has never had a reason or an incentive to get involved in the technical evaluation of these proposals until now (which is new and still underfunded). It has been DFINITY alone that has approved the current topology and I’m confident that they have been making decisions that they believe are in the best interest of the protocol. It’s just that now we have additional sets of eyes that are starting to look at these proposals and get up to speed on where we are and how we got here and there is bound to be differing opinions on what is right. These are very interesting conversations and I look forward to learning more.

OK, this makes sense. Assuming DFINITY doesn’t have an alternate explanation and agrees that the topology is complex and needs an overhaul, then is it the right answer to reject every proposal (or a large portion of them) until this happens? Would it make more sense to recognize there needs to be a hefty re-optimization effort and that there will be some proposals that need to pass in spite of the fact that they are neutral or negative relative to the target topology?

By the way, part of the reason I am asking is that I just set the CodeGov neuron to follow you since you have indicated that you are already committed to reviewing the Subnet Management proposal topic and you won one of the grants to be a reviewer for the Subnet Management proposal topic. The CodeGov team is not ready to begin our formal reviews since the grant program has not started and our current funding does not support the Subnet Management proposal topic, but the CodeGov commitment is to seek skilled reviewers for every proposal topic and follow them if they are reliable. Hence, it makes sense for CodeGov to follow you at this time. I’m just trying to understand if the majority of all proposals for this topic will end up getting rejected due to your concerns with meeting the target topology or if there is a need to step back and take a look at the bigger picture. I suspect you are already thinking along these lines and I’m sure you need additional feedback from DFINITY to better understand the right answer. I support your interest in pushing us in the right direction if we are not already heading in that direction.

1 Like

Thanks Wenzel. The topology target (and whether it can be met) is what’s used for determining if the IC has enough node machines and node providers onboarded. The new IC Target Topology is based on the premise that this is currently the case, and that it would still be the case with the more ambitious topology. I see no evidence of this, and only strong indicators to the contrary.

Ongoing dialog with DFINITY has fed into my point of view on this, and I was surprised to see this motion proposal submitted while those concerns and dialogue was ongoing.

Yes, the practicality of having enough nodes onboarded (which is a problem that needs addressing if target can’t be met, particularly if that target is proposed to be made more stringent). I don’t expect any other reason to be the case (other than accidental oversights), particularly given that the NNS subnet is currently one of the worst offenders (where security and robustness is paramount), and given the sorts of responses I’ve read recently from DFINITY. If this were the case, any deviation on these terms should be clearly communicated in whichever proposals are used to enact that deviation. It should be made clear that the community is being asked to trade-off security and robustness against some other metric (effectively overturning aspects of the existing IC Target Topology motion proposal, and even more so if this one passes).

The only practical explanation I’ve seen for why some subnets might deviate from the formal target is this →

This spells out that there are not enough spare nodes to reliably meet targets for all subnets on the network - which is in stark contrast to the premise that the current motion proposal rests upon.

No, to be clear, the recent rejections are due to payload error and inappropriate NNS function to be actioned.

I accept any proposal that claims to improve decentralisation and does so (if there are not other significant issues with the proposal). I hope this clarifies things.

Thanks for following :slightly_smiling_face: :+1:

2 Likes

@SvenF and/or @sat, are you able to provide some clarity? Some voters are holding off to see what DFINITY has to say about this first. :slight_smile: