Subnet Management - uzr34 (II)

Lorimer · September 27, 2024, 5:36pm

Proposal 133149

1 degraded node replaced with another Sri Lanka node. Nice and simple, looks good. I’ve adopted.

Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) →

	Smallest Distance	Average Distance	Largest Distance
EXISTING	0 km	7898.216 km	19448.574 km
PROPOSED	0 km	7898.216 km	19448.574 km

Subnet characteristic counts →

	Continents	Countries	Data Centers	Owners	Node Providers
EXISTING	5	24	31	31	31
PROPOSED	5	24	31	31	31

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) →

	Continent	Country	Data Center	Owner	Node Provider
EXISTING	10	3	1	1	1
PROPOSED	10	3	1	1	1

See here for acceptable limits → Motion 132136

The above subnet information is illustrated below, followed by a node reference table:

Map Description

Red marker represents a removed node (transparent center for overlap visibility)
Green marker represents an added node
Blue marker represents an unchanged node
Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table

	Continent	Country	Data Center	Owner	Node Provider	Node	Status
`---`	~~Asia~~	~~Sri Lanka~~	~~Colombo 1 (cm1)~~	~~OrionStellar~~	~~Geodd Pvt Ltd~~	~~suty3-goyd2-t6ngb-dsgzv-vkvt6-w2x4t-lioxl-2xsaa-ztaly-y2ov2-hae~~	~~DEGRADED~~
`+++`	Asia	Sri Lanka	Colombo 1 (cm1)	OrionStellar	Geodd Pvt Ltd	fhg3q-muslh-pp7ur-hcivl-q3kof-mju7a-kjyyf-hjifg-nsa35-nqjv3-7qe	UNASSIGNED
	Americas	Argentina	CABA 1 (ar1)	SyT - Servicios y Telecomunicaciones S.A.	Mariano Stoll	v27at-hedf7-4a2my-tboq6-escdm-77krt-2qfuq-zjptf-z2sbk-vd7zs-xae	UP
	Oceania	Australia	Queensland 1 (sc1)	NEXTDC	ANYPOINT PTY LTD	zjiki-pzvnv-m4rnn-fodt3-poon4-uldx7-d3wkq-gptsu-g2mjs-boi35-3qe	UP
	Europe	Belgium	Antwerp (an1)	Datacenter United	Allusion	xu2zg-nns7x-z67l6-foa5w-yc4ek-addku-g2hqg-c3jdg-wabdu-5ouaw-yae	UP
	Americas	Canada	Fremont (fm1)	Hurricane Electric	Boolean Bit, LLC	z4jw5-v4ee6-aa7gr-5axkc-4ocjy-v5vv5-inwc6-ma4hw-mb7jv-3skxy-eqe	UP
	Americas	Canada	Toronto (to1)	Cyxtera	Blockchain Development Labs	x3cey-uerdd-53a7n-d45e2-gjsnd-airg5-nrs5i-4xujk-c5ynl-4pie6-yqe	UP
	Europe	Switzerland	Zurich 3 (zh3)	Nine.Ch	Tomahawk.vc	v2pkj-vpsow-fp24q-zqwfj-p3nek-m52xz-oz6ra-blmra-73voj-jvwb5-gae	UP
	Asia	China	HongKong 1 (hk1)	Unicom	Pindar Technology Limited	w4ri3-ytfnq-jg3z3-qseka-se4xe-b2fl2-km766-ruzwd-riw72-6bifs-4ae	UP
	Americas	Costa Rica	Bogota 1 (bg1)	EdgeUno	Geeta Kalwani	aajth-ndp7x-ro5ok-yikyd-4i7xn-5k5ki-e3d37-hh4gn-s4opz-bnzxf-4qe	UP
	Europe	Czechia	South Moravian Region 1 (bn1)	Master Internet	Maksym Ishchenko	zgtrt-4vlgr-pbytl-t2yqq-qf4nk-wyoos-vrfpu-hxqcw-tcnfu-73kjb-pae	UP
	Europe	Spain	Madrid 1 (ma1)	Ginernet	Ivanov Oleksandr	yi6r6-u4kax-jphcr-jcqr5-t3zpm-gmp3b-2hiew-iinpf-sgjos-eabha-aqe	UP
	Europe	Estonia	Tallinn 2 (ta2)	Telia DC	Artem Horodyskyi	zgeaf-fcq4e-fcnht-g7mpg-sb7ff-r6awk-zvkwp-gkloc-rr6jl-ghsse-mqe	UP
	Europe	France	Paris 1 (pr1)	Celeste	Carbon Twelve	atjbz-kcjz7-y4mgn-t5wqp-3emfk-6mtlx-ln5i7-4pixf-ocjgh-hfu77-bqe	UP
	Asia	Georgia	Tbilisi 1 (tb1)	Cloud9	George Bassadone	uouxk-c246i-dgxzd-ql3a5-koofn-mclrv-toplo-bg76d-l4dzk-ngb3c-nae	UP
	Europe	Croatia	Zagreb 1 (zg1)	Anonstake	Anonstake	q3vac-kcwo2-ruiht-nflb7-ifoev-vkjcw-quybi-ugvgn-pqfwp-jntxi-dqe	UP
	Asia	India	Navi Mumbai 1 (nm1)	Rivram	Rivram Inc	ecxbl-3dp33-mpskv-yvs6f-674ct-tpr4d-dhzdg-jfgf4-gzhny-n6zzx-lqe	UP
	Asia	India	New Delhi 1 (nd1)	Marvelous Web3 DC	Marvelous Web3	67t6p-i4h3c-msv6p-kmbmm-rr6gj-z3nix-d6lo2-mq3q4-3h6rb-lwkbc-lae	UP
	Asia	India	Panvel 2 (pl2)	Yotta	Krishna Enterprises	dyycg-wc45f-jwks2-abddo-m3n5r-o5kxy-g5xhm-7ve4u-3tlk7-i7xec-oae	UP
	Asia	Israel	Tel Aviv 1 (tv1)	Interhost	GeoNodes LLC	rfkza-27bii-6jan4-u4zll-lkvmz-snmao-irmlj-arpdd-kyxrg-xnq3a-7ae	UP
	Asia	Japan	Tokyo (ty1)	Equinix	Starbase	go5zz-xs6yg-mylwl-v7uob-7bg4b-wjzhe-vmrwe-uy7mz-ckaz2-idm33-rqe	UP
	Asia	Korea (the Republic of)	Seoul 2 (kr2)	Gasan	Web3game	wjwzb-q3ogf-fi3po-kf6y6-wzuuj-3ac3m-kjvab-fufsm-z2skq-kthkx-xae	UP
	Europe	Poland	Warszawa 2 (wa2)	Central Tower DC	Bohatyrov Volodymyr	mswad-oq7wj-5r4yy-b5qoy-cmv7z-wzfb3-ktn6l-rcnrz-mni2f-lsys6-wqe	UP
	Asia	Singapore	Singapore 2 (sg2)	Telin	OneSixtyTwo Digital Capital	qp3lh-25yxy-dlk4t-ay73d-frr4t-3kmi5-35kqg-3vvbq-26qhh-6xrdr-oqe	UP
	Europe	Slovenia	Ljubljana (lj1)	Posita.si	Fractal Labs AG	6adxp-p7u63-xsdtk-lo6oc-vpqmi-44hgt-yv652-cbm5p-mssge-wsrz6-oqe	UP
	Europe	Sweden	Stockholm 1 (sh1)	Digital Realty	DFINITY Operations SA	vgfnl-4phvh-44pk3-yshmp-ckwz3-qnzob-l5wnj-pqn2j-vv5jh-3oewk-xqe	UP
	Americas	United States of America (the)	Chicago 3 (ch3)	CyrusOne	MI Servers	gtfa3-saq3t-ymlel-lsf6d-ans7b-cr45x-xg5np-xbxyt-nxfrt-iynyy-5qe	UP
	Americas	United States of America (the)	Orlando (or1)	Datasite	Giant Leaf, LLC	z5a4h-43szy-vvp4j-xorii-l6yma-4iyzt-7o3ry-frvqe-azkit-5iag2-rqe	UP
	Americas	United States of America (the)	Panama City 1 (pc1)	Navegalo	Bianca-Martina Rohner	y7bml-csbq7-euzyf-njmvm-qfftp-iy7lc-wisaq-jlmul-sdo7p-7lkx4-3ae	UP
	Africa	South Africa	Cape Town 2 (ct2)	Teraco	Kontrapunt (Pty) Ltd	kgo2t-vidyw-yw2g5-pqwrt-nr227-rbq2o-pog27-zarc2-dfrlw-vvjge-4qe	UP
	Africa	South Africa	Gauteng 2 (jb2)	Africa Data Centres	Karel Frank	xav3a-kdo3a-2rgbg-o6vnk-clat5-bcc7w-vmnej-z55rx-mfx26-7xugo-tqe	UP
	Africa	South Africa	Gauteng 3 (jb3)	Xneelo	Wolkboer (Pty) Ltd	kwryq-ezysk-c4ono-aet7a-hh6h5-4o3bb-a33et-ef4g5-42tot-zaek6-fae	UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Other good neurons to follow:

Synapse (follows the LORIMER and CodeGov known neurons for Subnet Management, and is a generally well informed known neuron to follow on numerous other topics)
CodeGov (actively reviews and votes on Subnet Management proposals, and is well informed on numerous other technical topics)
WaterNeuron (the WaterNeuron DAO frequently discuss proposals like this in order to vote responsibly based on DAO consensus)

timk11 · September 28, 2024, 3:22am

Voted to adopt Proposal 133149.

This proposal replaces a node in subnet uzr34: suty3, which appears as “Status: Degraded” in the IC dashboard, with another node from the same node provider and data centre thereby having no impact on Nakamoto coefficients or target topology parameters.

ZackDS · September 30, 2024, 9:00am

Voted to adopt with nothing to object.

LaCosta · October 1, 2024, 12:17pm

Voted to adopt proposal 133149. Replaces a degraded node without changing the Nakamoto Coefficient.

dsharifi · October 4, 2024, 11:22am

DFINITY will submit an NNS proposal today to reduce the notarization delay on the Internet Identity subnet, uzr34, similar to what has happened on other subnets in recent weeks (you can find all details in this forum thread).

After the successful rollout of the Application subnets, we propose a gradual rollout for the System subnets, starting with the Internet Identity subnet.

Lorimer · October 4, 2024, 11:49am

Thanks for this announcement @dsharifi, it seems to have aligned perfectly with my lunch break

Are you able to elaborate on the choice of subnet? The SNS subnet is technically an application subnet (albeit a special one). It has 34 nodes. I’d expect this to be a safer choice for starting the production changes on the large subnets (or maybe the fiduciary subnet). If something unexpected goes wrong in production, the II subnet would probably have one of the highest blast radiuses wouldn’t it?

At the moment this change has only been deployed to 13 node subnets (all of them).

ZackDS · October 4, 2024, 12:11pm

Voted to adopt proposal 133307. The subnet id and the delay are correct. I don’t think that having larger number of nodes per subnet would be any issue with this, there are other limits in place that protect the subnet.

Lorimer · October 4, 2024, 12:20pm

Larger subnets take longer to disseminate artifacts. That’s why this subnet currently has a delay of 1000ms instead of 600ms.

@dsharifi is setting this subnet to 300ms intentional? Could you elaborate? Perhaps this relates to the optimisation in last week’s IC OS proposal making the delay adaptive based on network conditions?

ZackDS · October 4, 2024, 12:35pm

You mean the dynamic delay ? Also would you have been more comfortable with a reduction to only 600 ? I guess that we will have to wait for official answer.

timk11 · October 4, 2024, 2:07pm

Also I see from using the ic-admin tool that the current notarisation delay is 1000ms, not 600ms as the proposal says. I presume this might have been a typo but is there a case for lowering it to 600ms first? Is there some testing to help guide this decision?

@dsharifi @LaCosta

dsharifi · October 4, 2024, 2:46pm

Hi,

Indeed, the current notarization delay is 1000ms on the Internet Identity subnet and all other System subnets. I forgot to change and update our proposal template summary description to 1000ms when submitting the proposal. So yes, the 600ms in the summary is an error/typo.

Yes, we have done extensive benchmarks with testnets with 40 and 31 nodes that indicate it is safe to lower the notarization delay down to 300ms. In the benchmarks we also simulate RTT, packet loss, and bandwidth to mimic the production topology based on our metrics. The benchmarks are the same ones we did with the Application testnets. The benchmarks involve:

Stress testing the subnet with a high load of Ingress messages, filling every execution round.
Kill nodes on the subnet for an extended duration, then restart the node to verify that it can state sync, catch up, and participate in block-making

ZackDS · October 4, 2024, 2:49pm

Thanks for clearing this.

Lorimer · October 4, 2024, 5:06pm

Thanks for the extra explanation @dsharifi. I’m afraid I have to reject the proposal as it fundamentally misrepresents itself to voters, which needs to be a no no (even if it’s by accident) to avoid building up potentially dangerous precedent (and promoting bad voting culture).

Regarding the change itself (aside from the wording), could you clarify why the same delay is being used for subnets that are more than twice the size of smaller subnets using that delay? My understanding is that finalisation rates are dependent on the size of the subnet. Are you planning to reduce the delay for 13 node subnets even further?

Could you also comment on the choice of subnet? Is the II subnet considered lower risk than the SNS or Fiduciary subnets? This is the first time such a change is being rolled out to a large subnet (and the magnitude of the difference compared to the existing configuration is significantly greater).

timk11 · October 5, 2024, 10:21am

I’ve voted to reject proposal 133307. As clarified by @dsharifi above (and thank you for explaining!) the current notarisation delay was mistakenly given as 1000ms in the proposal instead of 600ms. This is a key detail as it means that the magnitude of the change is greater than what voters may be given to understand. The description of the testing is very reassuring. However, from looking through recent Subnet Management proposals I’ve noticed that a number of nodes have had a sharp decrease in performance and have been listed for removal (from a subnet) following the decrease in notarisation delay for their respective subnets. Is this a valid concern and perhaps a greater risk for the larger subnets? I’m leaning towards favouring a smaller decrease at first, perhaps to 600ms, but I’m very open to being persuaded otherwise.

LaCosta · October 5, 2024, 3:22pm

Voted to reject proposal 133307. Thanks for the thorough explanation on the proposal but I have to agree that even if it is just a simple mistake regarding the previous notarization delay mentioned in the proposal, it still might mislead people that might have different opinions otherwise as @timk11 and @Lorimer had. Also I think that providing more information specially when scaling this proposals to bigger and more relevant subnets should be done initially and I also would like to hear more on how the stress testing on this subnets work, for example do you simulate the distances between nodes that provide a similar behavior with the targeted subnet? Is there a way to verify the performance of those testnets?

dsharifi · October 7, 2024, 1:49pm

Thanks for the response. I agree. This proposal should be rejected and re-submitted to not set a precedent where we vote on proposals that have misleading summaries, intentional or not.

Before answering your questions, I think it’s worth giving some context on what the notarization delay is, why it is needed, and how we choose a correct value for it.

What is a notarization delay?

When a node receives a block proposal, it verifies the block content and waits for a short duration before notarizing (signing) the block and sending its notarization to all its peers in the subnet. The time a node waits before sending the notarization share is what we call the notarization delay (ND).

For a block to be finalized, a block maker needs to send a block proposal to its peers nodes in the subnet, and at least 2/3 * subnet_size + 1 of the nodes must notarize the message and send its notarization share to the other nodes.

The ND serves as a throttle for the maximum finalization rate of a subnet. I.e. a block can never be finalized faster than the ND, as the ND is the minimum time it takes for nodes to share a notarization of a block when they receive a block proposal. So the finalization rate in a subnet will always be less than 1 block / ND, in this case, more specifically 1 block / 0.3s => 3.33 block/s.

In practice, the block rate is lower than the theoretical maximum, as there are overheads in processing a block, the execution time of canister calls, networking latencies between peers, etc.

Why do we have a notarization delay?

A node can fall behind the rest of the nodes in a subnet, for example, due to networking issues, crashing and restarting, or newly joining a subnet. When a node is behind it needs to catch up with the rest of the subnet to participate in the consensus (block proposals) of the subnet. For a node to catch up, the node typically downloads the state of the subnet at some checkpoint block height and must replay the missing blocks at a higher rate than the subnet finalizing blocks. However, if the node that is behind is unable to replay blocks at a higher rate than the rest of the subnet is finalizing blocks, then the node that is behind will never be able to catch up, as it will always lag behind the rest of the subnet.

To make it possible for nodes to catch up if they are behind, we use the ND as a throttle for the rest of the subnet to slow down the finalization rate, such that the node that is behind can replay blocks at a faster rate than the subnet is finalizing blocks.

How much we throttle a subnet’s finalization rate by (the value of the ND), is independent of the subnet size. The ND is there to ensure that all nodes can participate in consensus and block making of a subnet, and that if a node that falls behind can replay blocks at a faster rate than the subnet finalizes new ones.

What factors do we need to consider for the notarisation delay?

Available networking bandwidth on nodes

A node that is behind needs to be able to download old blocks and replay them at a faster rate than the rest of the subnet. This means that the catching up node replays blocks at a rate of:
block_download_time + block_processing_time < ND + block_processing_time.

block_download_time < ND

In the worst case scenario, where we have full blocks, each block has a max size of 4MB. A node must also have a minimum of 300 Mb/s available bandwidth to meet the IC spec. Assume 200Mb/s is available for consensus.

block_size / bandwidth < ND

4MB / 200Mb/s < ND

32Mb / 200Mb/s < ND

0.16s < ND

Thus, from a networking perspective, we need a notarisation delay of at least 160ms in order for a node to catch up in a scenario where we are replaying blocks in a subnet that is under full load.

Overhead per block.

There is also an unavoidable overhead of processing blocks and executing them once they are finalized. Once a block with messages is notarized, the node will execute the messages in that block and certify it in parallel to making new blocks.The high level idea is that if the execution and certification steps take too long, then the bottleneck for the block rate will be determined by execution and certification.

This is something we have observed in busier subnets, such as the OpenChat subnets, meaning we need a notarisation delay, or a block rate throttler that ensure the nodes that are ahead spend longer time making and notarizing blocks than it takes to execute and certify the blocks for the nodes that are behind.

From the production metrics we have about these overheads, we have deduced that a 300ms ND is enough.

What testing and experiments have been conducted?

For all of our experiments, we have simulated Round Trip Times (RTT), bandwidth, and packet loss, to simulate the network behavior of nodes that we see on the Internet Identity subnet. Checkout simulate_network.rs in the IC repo for the source code on how we set these simulations with the transmission control (tc) linux utility.

We have mainly stress tested the simulated subnet by flooding it with a large number of ingress messages, installing many canisters to increase load, and killing nodes to verify that nodes can catch up when joining the subnet.

Our tests show that the change is perfectly fine regardless of the subnet type and subnet size.

dsharifi · October 7, 2024, 1:53pm

We will re-submit a new proposal with an updated summary to lower the notarization delay to 300ms.

timk11 · October 7, 2024, 11:05pm

Thanks @dsharifi for this very helpful explanation. Is this material also in a blog post or elsewhere in the online resources? If not, I think it would be well worth adding.

timk11 · October 7, 2024, 11:08pm

I’ve voted to adopt proposal 133315 based on this explanation. This proposal reduces the notarisation delay for subnet uzr34 with the aim of reducing network latency.

LaCosta · October 8, 2024, 12:36am

Thanks for the thorough explanation on this topic. I have voted to adopt proposal 133315 that reduces the notarization delay of the uzr34 subnet (Internet Identity subnet) from 1000ms to 300ms.

Topic		Replies	Views
Subnet Management - tdb26 (NNS) NNS proposal discussions nns , Governance , Subnet-management	130	1188	August 14, 2025
Subnet Management - 4zbus (Application) NNS proposal discussions nns , Governance , Subnet-management	83	737	August 18, 2025
Subnet Management - nl6hn (Application) NNS proposal discussions nns , Governance , Subnet-management	35	302	May 26, 2025
Subnet Management - mpubz (Application) NNS proposal discussions nns , Governance , Subnet-management	89	660	August 10, 2025
Subnet Management - shefu (Application) NNS proposal discussions nns , Governance , Subnet-management	31	228	August 25, 2025