What happens if 2/3 of a subnet's nodes decide to fork?

jzxchiang · March 12, 2022, 8:00pm

I have a basic (probably naive) conceptual question:

What’s to prevent 2/3 of the nodes in a subnet from colluding to fork off their own chain (with malicious consequences for canisters)? Since subnet node provider information is public, they could probably set up a Telegram chat to coordinate.

Follow-up questions / thoughts:

Can the NNS “master chain” do anything about this, or would it not even be able to detect such a fork?
Perhaps node shuffling is one solution?
Is this a situation where we should let more users participate in validation (but not in consensus)?
Can ZK SNARKS help here?

(I thought of this when I read this post by Vitalik.)

cryptodriver · January 18, 2023, 8:44am

I have the same question. Anybody can tell me?

free · January 18, 2023, 11:03am

Not much, really. If you assume that 9 (of 13) independent entities can actually agree on anything and they can coordinate their attack to that extent. Or, if we’re talking of a fiduciary subnet, some 20 out of 30 (I don’t remember the actual number of replicas on a fiduciary subnet, but it’s somewhere around 30).

It is possible to detect such a fork as long as at least one honest replica is still part of the subnet. Such a replica would diverge from the rest of the subnet and would report such a divergence. Currently via Prometheus metrics (and preserving the diverged state for debugging), but there’s no reason why (with the requisite amount of time and effort) this could not be reported to a canister or use some other less centralized monitoring and audit mechanism.

The NNS would not be able to do anything about this directly/automatically, because it’s hard to say why one or more replicas diverged. For the record, this has happened (a small handful of times) since Genesis. It was investigated every single time. And every single time it was determined to be caused by bugs (e.g. inconsistent ordering of ingress messages with the same timeout across replicas, resulting in different state hashes).

So even if you had the full (recent) history of the subnet and were able to replay it automatically, you couldn’t decide only based on the outcome whether the divergent nodes were malicious; the divergent nodes were honest and the majority of the subnet was malicious; or (most likely) you’re dealing with a non-determinism bug.

Also, unless you have one single monolithic blockchain (subnet) as e.g. Ethereum does, if a subnet behaves maliciously for even one round, it’s quite possible you won’t be able to undo the results of said malicious behavior. E.g. imagine you have a canister that holds Bitcoin; if the subnet it runs on behaves maliciously, it could presumably request an honest Bitcoin subnet/canister to transfer all its Bitcoin to such and such an address; even if you later realize this was malicious behavior, there’s no way to return the Bitcoin. The same is true for canisters holding/managing ICP or NFTs or whatever, running on non-malicious subnets.

[Have to drop out for lunch, will continue in a while.]

free · January 18, 2023, 12:22pm

Oh, before I forget, I’d like to make a disclaimer: anything that isn’t a statement of fact (which is usually my possibly limited understanding of how the protocol works), is my opinion as a software engineer; please don’t quote it as DFINITY’s official position.

I don’t think so. I guess there are two possible ways of putting such an attack into practice: scout out a specific subnet that you want to hijack and (assuming no node shuffling) start tracking down and bribing/blackmailing/convincing the respective node operators; or (assuming node shuffling) put together a small group of malicious node operators (maybe by financing them, maybe by meeting them on black hat forums) and wait until some subnet consists of a majority of nodes belonging to said node operators. The latter sounds more likely to work, with a lot less risk (you’re less likely to run into someone that will react negatively and noisily to your bribe/blackmail/whatever).

It depends. But my gut feeling is also “no”. Node operators are very much independent validators, i.e. users. Beyond the fact that they need to have a lot more capital to invest than your average blockchain validator (although you can always put together a group/DAO/whatnot), they are just independent third parties with a shared interest in the network’s well being. (Also, in order to validate a subnet blockchain, you would need equivalently powerful hardware – 512 GB of RAM, data center level SSDs. But let’s ignore that for now.)

And unless I run such a validator node myself, why would I trust a random user (which may pack up and head for greener pastures at any time) more than a random node operator?

Finally, I believe there is actual value in the privacy afforded by a subnet’s blockchain (containing all inputs to the virtual machine represented by the subnet). E.g. I don’t want my boss/wife/you to be able to trivially look up all my interactions with a canister. And allowing anonymous validators would require publishing (at least the recent) blockchain history. So something like this would only work for so-called “public subnets”.

I do believe there are solutions to the problem, though. As usual, given enough time/effort investment. E.g., as touched on above, a way for replicas to publicly report any divergence they experience. Or an audit log self-signed by each replica, listing which blocks it has actually executed and which it has skipped (and instead relied on state sync to catch up). Something like this could be used as a good indication (e.g. if 4 of 13 nodes all went “blind” at the same time and were forced to skip ahead) that something fishy may be going on. Or simply, if I happen to know/trust node operator X for any reason, I can choose to follow their audit reports with increased attention.

Finally, one could also set up a handful of nodes on a subnet (maybe each subnet, maybe those of “interest”) to never use state sync and instead always execute the full blockchain, even if they are arbitrarily far behind. Such a node (again, if trusted more than other nodes for whatever reason, could be used as an independent auditor: assuming it is honest, it could certify that no tampering has occurred, since it wouldn’t be susceptible to the attack described above (where a supermajority of malicious nodes temporarily cut out the honest nodes while tampering with the state of the subnet, forcing them to later state sync the tampered state without noticing anything untoward).

I doubt it. I’m by far not an expert in cryptography, but as I understand it ZK proofs impose huge overhead (as in, way more than the useful work being done). And they cannot be (easily) applied to general purpose computation. I.e. you would need a special runtime (and possibly have to deal with specific limiatations) in order to use it.

lastmjs · January 18, 2023, 5:00pm

I would like to dive into specifics but don’t want to take the time right now, so just want to leave an opinion. I disagree with many of your points. I think a combination of node shuffling, more validation instead of consensus (well, consensus on fewer parts of the stack, just message ordering), and validity proofs would help the security situation on the IC, perhaps drastically.

My main proposal for how something like this would look (at least with the validity proofs) can be found in this tweet thread: https://twitter.com/lastmjs/status/1611555103118344192

free · January 18, 2023, 6:38pm

Fair enough, but your design is predicated on “the existence of a performant zkWasm”.

As said, I have virtually zero experience with cryptography (beyond understanding of the basic concepts), but the few responses I’ve seen from actual cryptography researchers to similar suggestions seem to say that it would be very expensive to do a ZK proof; and that it may be entirely impractical to do it for arbitrary code.

If it turns out to be both cheap and fast to produce ZK proofs on the side, then sure, we can have our cake and eat it too. But I’d like to see some indication that this zkWasm is actually possible (and preferably implemented, at least as a prototype) before we go ahead and redesign everything from the ground up.

lastmjs · January 18, 2023, 7:14pm

Yes I agree, I’m assuming it’s possible.

There is a zkWasm early prototype: GitHub - DelphinusLab/zkWasm

Tbd · January 18, 2023, 8:15pm

how about doing fraud proofs until zk is ready? arbitrum are actually doing it GitHub - OffchainLabs/nitro: Nitro goes vroom and fixes everything

Nitro runs Geth at layer 2 on top of Ethereum, and can prove fraud over the core engine of Geth compiled to WASM.

Maxfinity · January 24, 2023, 6:37am

I think fraud proofs are more feasible than ZK proofs, especially when combined with data availability proofs, since these become one-shot.

Topic		Replies	Views
Subnets `mpubz`, `brlsh`, and `pjljw` Incident Retrospective - Friday October 15, 2021 Developers	19	1325	October 20, 2021
Are canisters interoperable between subnets? Developers	4	593	May 22, 2021
My rough idea on joining more validators to a network Developers	3	49	February 3, 2025
Introducing the first "fiduciary" subnet Developers	23	2532	April 4, 2023
tECDSA subnet id and takeover threshold Developers	4	850	February 2, 2023

What happens if 2/3 of a subnet's nodes decide to fork?

Related topics