Is it ture that only 13 nodes in one subnet is enough?

cryptodriver · June 24, 2024, 1:07pm

The answer is NO.

But @dfinity and the team thinks it is enough.

Is this alien tech?

@dom always saying, ICP never stop. The fact is that that is a lie.

lastmjs · June 24, 2024, 1:18pm

I am very interested to read the post mortem on this one.

I also think deep analysis of subnet size would be very beneficial, as it seems somewhat nebulous and unclear what the appropriate size of a subnet should be for a given use case.

And I agree that saying that ICP has never gone down or never goes down might not be very accurate…I’m not sure what has been meant by that in the past, as individual subnets can and do go down or degrade in performance.

cryptodriver · June 24, 2024, 1:28pm

As far as I can remember, this shouldn’t be the first time.
I hope ICP fundamentally solves the rigid, fixed-node subnet architecture.

How do you think about the rigid, fixed-node subnet architecture.?

bjoern · June 24, 2024, 1:37pm

The subnet stalled due to a software bug in the replica implementation. A fix has been prepared, is currently being tested and should be rolled out to the subnet next. More details will be provided in a post mortem in the next few days, right now the priority is on recovering the affected subnet.

cryptodriver · June 24, 2024, 1:40pm

I’m glad to see that the team responded promptly.
However, this does not fundamentally solve the problem.
The subnet may still go down for other reasons next time.

Vivienne · June 24, 2024, 1:40pm

And to expand a little bit: since the problem is in the replica implementation (instead of a number of nodes spontaneously going offline or becoming malicious) having more nodes in the subnet would not help since all (not sure about ‘all’, but certainly too many to make progress) replicas run into the same bug

cryptodriver · June 24, 2024, 1:43pm

Is it possible to abandon the rigid structure of fixed nodes?
I am not talking about the number of nodes, but whether the protocol can detect down nodes and re-introduce healthy nodes to form a subnet?

diegop · June 24, 2024, 3:14pm

I agree with your high-level intent and i think it makes sense.

While I disagree with your proposed implementation, I agree with the high-level goals of reliability and improved self-healing. I believe most people would support these goals. I suspect that knowing others share your high-level intentions was the main point you wanted to convey.

I do have some bias:

Almost (if not all) incidents comes from a bug being being introduced via a proposal to a subnet. Note that a proposal is VOTED on. so once a proposal is blessed, it gets deployed to all the nodes in a subnet as the new blessed version.
Subnet size would not help this.
I agree with the high-level intent though. The goal is reliability. Are there any historical incidents that could have been self-detected and self-fixed by the subnet (or NNS) itself without human interaction? Are automatic rollbacks helpful? Would that be possible? If it is possible, would it hav been wise? All great questions I do not know the answer to.

cryptodriver · June 24, 2024, 11:52pm

After a problem occurs, it is important to find the direct cause and solve it.

However, if you do not prevent it from happening again, the same problem will happen again.

As for this problem, has it been solved fundamentally?

Absolutely not! Will it happen again? Of course!

cryptodriver · September 18, 2024, 4:45am

YES, as predicted, It happened again.

cryptodriver · September 18, 2024, 4:47am

You can’t imagine that 3 out of a 13-node subnet are down.

This happened in ICP.

bjoern · September 18, 2024, 7:16am

Just to be clear: this had nothing to do with the size of the subnet. It was a bug in the implementation of the protocol. Post mortem report will explain the exact cause.

Topic		Replies	Views
Subnet `lhg73` is stalled NNS proposal discussions Subnet-management	38	741	September 18, 2024
Post-mortem on the SNS subnet incident on November 13 Developers	0	356	November 22, 2023
Is ther any plan to increase node count of subnet commonly? Roadmap	5	947	January 30, 2023
What is happening to the subnets lately going down? DFINITY	12	484	May 12, 2025
Decentralisation and subnet size Roadmap	5	329	April 15, 2024

Is it ture that only 13 nodes in one subnet is enough?

Related topics