Shuffling node memberships of subnets: an exploratory conversation

I think both of you are right, but probably shuffling make things much worst, see my latest posts in the other thread. Your attack vector could be fixed in another way.

How so? I think your dice roll model is a little too simplistic. If you use the same numbers but don’t have shuffling, the risk of an attack becomes pretty much a forgone conclusion in all 3 cases. This is if you assume the nodes of a bad actor have inherent affinity (they want to cluster together).

As subnets scale to infinity, everyone on Earth becomes a provider, leading to subnets dominated by family or friends. This is basically the shuffling case.

No, thats what happens if you don’t shuffle and if there’s a tendency for family or friends to want to reside on the same subnet (i.e. if they’re bad actors).

The smallest selection pressure for clustering sybiling nodes within a subnet eventually wins out if you do not have an intentional opposing force that keeps mixing subnets up.

Theoretically I agree that there is an attack vector. Some questions I would be interested in:

  • How often does it happen that a node degrades and needs to be replaced? We must have some data from the past.
  • When a node is replaced then isn’t it replaced by a different node of the same provider?
  • If it has to be replaced by a node from a different provider then how is the new provider chosen? Randomly?
  • How is it decided which providers have spare nodes? Can a provider do anything to increase his probability of having spare nodes, and thereby increase his chances of being selected as the replacement for a degraded node?
3 Likes

I agree with this, but do you keep shuffling indefinitely? Doesn’t it eventually revert to the problematic subnet distribution (or the one you’re trying to avoid)?

But indeed, we probably need to add noise at each selection step.

From memory, there is a paper by Censor-Hillel which addresses a problem that may be plausibly assimilated into this one, namely, how to make asynchronous computations robust to noise (where “noise” could be also understood as bad actors corrupting messages).

The paper provides a distributed coding scheme where the network topology is “not known” in advance, other than how many nodes are in the network. That is, “knowledge” is a local topology where each node only knows the set of identities of its own neighbors, nothing more.

Again, a similar situation: you know how many nodes are in the network (easy) but not necessarily those who are honest (potentially difficult) hence the topology is known in one sense but not fully in another.

There might be something there worth looking at.

Don’t have the reference with me ATM but I think her paper is publicly available.

Yeah, I think continuous randomised shuffling (though restricted by deterministic constraints, ensuring each selected combination conforms to the IC Target Topology) is the only way to ensure that security improves as the IC scales (in terms of collusion resistance - which is the big issue).

Shuffling pushes this eventuality out over a huge time horizon, while also ensuring that it only occurs for a brief period of time. If you assume a single entity owns 10% of all nodes across the entire IC network, but that every node owned by them is under a separate fraudulent identity (or an entity that’s willing to collude with them), and they’re strategically located in different countries and data centres etc. Even then the probability that a 13 node subnet will be in a state that facilitates collusion (to the point of controlling the subnet) after a shuffle step is 0.1^9. If a shuffle step occurs daily, it would take many many thousands of years for a 24 hour collusion time windows to plausibly open up (by my back-of-the-envelope calculations). Also, shuffling doesn’t need to be a silver bullet, it just needs to be better than not shuffling.

There’s an interesting dicussion about this happening in the WaterNeuron community. @EpicICP just pointed me to a postcast that talks about this (it makes interesting listening).

4 Likes
  • It’s not infrequent that nodes go offline and degraded. Most weeks there are a bunch of proposals to replace these sorts of nodes with healthy nodes (for numerous subnets).
  • There are no constraints that enforce a node is replaced by another node from the same node provider. This can’t be expected to be possible I’m afraid. The recent situation with DFINITY-controlled node scarcity is an example. The IC Target Topology simply needs to be satisfied by the selection (but even that’s not always possible, which is also demonstrated by the example linked above).
  • Anyone can submit a proposal, so secondary selection criteria (and/or any randomisation) is not controlled/guaranteed.
  • Numerous factors can affect the chances of an NP having spare nodes. They can influence this themself by intentionally degrading their nodes to encourage them to be removed from a subnet (at which point they become available for other subnets, as long as they’re no longer degraded).
2 Likes

Oh, I see. That’s great news! But I think the problem now is that when you increase the node share to 20–30%, you get a colluded subnet within a matter of days, as it is very sensitive to the percentage. (According to Grok :sweat_smile:)

You need to be comparing this to the situation without shuffling. This is about reducing the odds of collusion as more nodes are onboarded rather than increasing the odds.

In any case, at a certain scale it becomes unreasonable to imagine that 30% of the nodes in existence each belong to a distinct fraudulent identity that is ready to collude with every other one of the 30% of fraudulently owned nodes.

Also, for larger subnets such as the system subnets, the odds are even better than the ones quoted above (even if you assume 30% of the nodes across the IC are ready to collude). The security that shuffling offers is similarly very sensitive to the number of nodes that represent the failure domain (in a good way). If, with each shuffle step, a node is randomly chosen from a pool where a minority would collude with each other (e.g. 30%), every additional node in the failure domain makes an exponentially positive difference to collusion-resistance. 27 nodes would need to collude to control a 40 node subnet (such as the NNS). The SNS subnet is also great in this respect (along with numerous other particularly important subnets).

Anyone who’s concerned can deploy their canisters to the fiduciary subnet (34 nodes). At the end of the day users (and devs) can vote with their feet. If the demand for larger subnets increases, not a problem, more larger subnets will emerge.


But we need node shuffling for any of this to be true :slightly_smiling_face: At least, this is my contention. If my calculations are off, please join in the debate and let me know.

2 Likes

I think your calculations are good. I would love to think that we will implement node shuffling, but in reality, I feel it would be quite an expensive operation. Another concern is that the percentage could be higher than 10%, and together with other constraints, the probability of collusion might end up being much higher in practice. But yeah, it would be awesome to implement node shuffling. If that’s not possible, I think giving more control to dapps (like many other blockchains do) could be a solution; or perhaps a mix of everything.

1 Like

Expensive in terms of NNS voter attention, or in other ways? Subnet membership changes currently require an NNS proposal, but that won’t always the case. Eventually a system canister will be sampling node health stats and performing the decentralisation calculations necessary to identify replacement candidates, all on chain with no need for a proposal (other than the one that updates the canister and specifies the algorithm used).

This is the point where I believe it would be fairly easy to just ‘switch on’ node shuffling.

@Sat, how long do you think it might be until subnet membership changes no longer necessitate a dedicated NNS proposal? Also do you have thoughts about the utility of automatic, randomised node shuffling?

I was thinking in terms of computational resources, network messages, and so on. But I guess if you only change one node on each subnet per day, it sounds more doable.

1 Like