Subnet Management - uzr34 (II)

Proposal 133149

1 degraded node replaced with another Sri Lanka node. Nice and simple, looks good. I’ve adopted.

Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) →

Smallest Distance Average Distance Largest Distance
EXISTING 0 km 7898.216 km 19448.574 km
PROPOSED 0 km 7898.216 km 19448.574 km

Subnet characteristic counts →

Continents Countries Data Centers Owners Node Providers
EXISTING 5 24 31 31 31
PROPOSED 5 24 31 31 31

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) →

Continent Country Data Center Owner Node Provider
EXISTING 10 3 1 1 1
PROPOSED 10 3 1 1 1

See here for acceptable limits → Motion 132136

The above subnet information is illustrated below, followed by a node reference table:

Map Description
  • Red marker represents a removed node (transparent center for overlap visibility)
  • Green marker represents an added node
  • Blue marker represents an unchanged node
  • Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
  • Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table
Continent Country Data Center Owner Node Provider Node Status
--- Asia Sri Lanka Colombo 1 (cm1) OrionStellar Geodd Pvt Ltd suty3-goyd2-t6ngb-dsgzv-vkvt6-w2x4t-lioxl-2xsaa-ztaly-y2ov2-hae DEGRADED
+++ Asia Sri Lanka Colombo 1 (cm1) OrionStellar Geodd Pvt Ltd fhg3q-muslh-pp7ur-hcivl-q3kof-mju7a-kjyyf-hjifg-nsa35-nqjv3-7qe UNASSIGNED
Americas Argentina CABA 1 (ar1) SyT - Servicios y Telecomunicaciones S.A. Mariano Stoll v27at-hedf7-4a2my-tboq6-escdm-77krt-2qfuq-zjptf-z2sbk-vd7zs-xae UP
Oceania Australia Queensland 1 (sc1) NEXTDC ANYPOINT PTY LTD zjiki-pzvnv-m4rnn-fodt3-poon4-uldx7-d3wkq-gptsu-g2mjs-boi35-3qe UP
Europe Belgium Antwerp (an1) Datacenter United Allusion xu2zg-nns7x-z67l6-foa5w-yc4ek-addku-g2hqg-c3jdg-wabdu-5ouaw-yae UP
Americas Canada Fremont (fm1) Hurricane Electric Boolean Bit, LLC z4jw5-v4ee6-aa7gr-5axkc-4ocjy-v5vv5-inwc6-ma4hw-mb7jv-3skxy-eqe UP
Americas Canada Toronto (to1) Cyxtera Blockchain Development Labs x3cey-uerdd-53a7n-d45e2-gjsnd-airg5-nrs5i-4xujk-c5ynl-4pie6-yqe UP
Europe Switzerland Zurich 3 (zh3) Nine.Ch Tomahawk.vc v2pkj-vpsow-fp24q-zqwfj-p3nek-m52xz-oz6ra-blmra-73voj-jvwb5-gae UP
Asia China HongKong 1 (hk1) Unicom Pindar Technology Limited w4ri3-ytfnq-jg3z3-qseka-se4xe-b2fl2-km766-ruzwd-riw72-6bifs-4ae UP
Americas Costa Rica Bogota 1 (bg1) EdgeUno Geeta Kalwani aajth-ndp7x-ro5ok-yikyd-4i7xn-5k5ki-e3d37-hh4gn-s4opz-bnzxf-4qe UP
Europe Czechia South Moravian Region 1 (bn1) Master Internet Maksym Ishchenko zgtrt-4vlgr-pbytl-t2yqq-qf4nk-wyoos-vrfpu-hxqcw-tcnfu-73kjb-pae UP
Europe Spain Madrid 1 (ma1) Ginernet Ivanov Oleksandr yi6r6-u4kax-jphcr-jcqr5-t3zpm-gmp3b-2hiew-iinpf-sgjos-eabha-aqe UP
Europe Estonia Tallinn 2 (ta2) Telia DC Artem Horodyskyi zgeaf-fcq4e-fcnht-g7mpg-sb7ff-r6awk-zvkwp-gkloc-rr6jl-ghsse-mqe UP
Europe France Paris 1 (pr1) Celeste Carbon Twelve atjbz-kcjz7-y4mgn-t5wqp-3emfk-6mtlx-ln5i7-4pixf-ocjgh-hfu77-bqe UP
Asia Georgia Tbilisi 1 (tb1) Cloud9 George Bassadone uouxk-c246i-dgxzd-ql3a5-koofn-mclrv-toplo-bg76d-l4dzk-ngb3c-nae UP
Europe Croatia Zagreb 1 (zg1) Anonstake Anonstake q3vac-kcwo2-ruiht-nflb7-ifoev-vkjcw-quybi-ugvgn-pqfwp-jntxi-dqe UP
Asia India Navi Mumbai 1 (nm1) Rivram Rivram Inc ecxbl-3dp33-mpskv-yvs6f-674ct-tpr4d-dhzdg-jfgf4-gzhny-n6zzx-lqe UP
Asia India New Delhi 1 (nd1) Marvelous Web3 DC Marvelous Web3 67t6p-i4h3c-msv6p-kmbmm-rr6gj-z3nix-d6lo2-mq3q4-3h6rb-lwkbc-lae UP
Asia India Panvel 2 (pl2) Yotta Krishna Enterprises dyycg-wc45f-jwks2-abddo-m3n5r-o5kxy-g5xhm-7ve4u-3tlk7-i7xec-oae UP
Asia Israel Tel Aviv 1 (tv1) Interhost GeoNodes LLC rfkza-27bii-6jan4-u4zll-lkvmz-snmao-irmlj-arpdd-kyxrg-xnq3a-7ae UP
Asia Japan Tokyo (ty1) Equinix Starbase go5zz-xs6yg-mylwl-v7uob-7bg4b-wjzhe-vmrwe-uy7mz-ckaz2-idm33-rqe UP
Asia Korea (the Republic of) Seoul 2 (kr2) Gasan Web3game wjwzb-q3ogf-fi3po-kf6y6-wzuuj-3ac3m-kjvab-fufsm-z2skq-kthkx-xae UP
Europe Poland Warszawa 2 (wa2) Central Tower DC Bohatyrov Volodymyr mswad-oq7wj-5r4yy-b5qoy-cmv7z-wzfb3-ktn6l-rcnrz-mni2f-lsys6-wqe UP
Asia Singapore Singapore 2 (sg2) Telin OneSixtyTwo Digital Capital qp3lh-25yxy-dlk4t-ay73d-frr4t-3kmi5-35kqg-3vvbq-26qhh-6xrdr-oqe UP
Europe Slovenia Ljubljana (lj1) Posita.si Fractal Labs AG 6adxp-p7u63-xsdtk-lo6oc-vpqmi-44hgt-yv652-cbm5p-mssge-wsrz6-oqe UP
Europe Sweden Stockholm 1 (sh1) Digital Realty DFINITY Operations SA vgfnl-4phvh-44pk3-yshmp-ckwz3-qnzob-l5wnj-pqn2j-vv5jh-3oewk-xqe UP
Americas United States of America (the) Chicago 3 (ch3) CyrusOne MI Servers gtfa3-saq3t-ymlel-lsf6d-ans7b-cr45x-xg5np-xbxyt-nxfrt-iynyy-5qe UP
Americas United States of America (the) Orlando (or1) Datasite Giant Leaf, LLC z5a4h-43szy-vvp4j-xorii-l6yma-4iyzt-7o3ry-frvqe-azkit-5iag2-rqe UP
Americas United States of America (the) Panama City 1 (pc1) Navegalo Bianca-Martina Rohner y7bml-csbq7-euzyf-njmvm-qfftp-iy7lc-wisaq-jlmul-sdo7p-7lkx4-3ae UP
Africa South Africa Cape Town 2 (ct2) Teraco Kontrapunt (Pty) Ltd kgo2t-vidyw-yw2g5-pqwrt-nr227-rbq2o-pog27-zarc2-dfrlw-vvjge-4qe UP
Africa South Africa Gauteng 2 (jb2) Africa Data Centres Karel Frank xav3a-kdo3a-2rgbg-o6vnk-clat5-bcc7w-vmnej-z55rx-mfx26-7xugo-tqe UP
Africa South Africa Gauteng 3 (jb3) Xneelo Wolkboer (Pty) Ltd kwryq-ezysk-c4ono-aet7a-hh6h5-4o3bb-a33et-ef4g5-42tot-zaek6-fae UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Other good neurons to follow:

  • Synapse (follows the LORIMER and CodeGov known neurons for Subnet Management, and is a generally well informed known neuron to follow on numerous other topics)

  • CodeGov (actively reviews and votes on Subnet Management proposals, and is well informed on numerous other technical topics)

  • WaterNeuron (the WaterNeuron DAO frequently discuss proposals like this in order to vote responsibly based on DAO consensus)

2 Likes

Voted to adopt Proposal 133149.

This proposal replaces a node in subnet uzr34: suty3, which appears as “Status: Degraded” in the IC dashboard, with another node from the same node provider and data centre thereby having no impact on Nakamoto coefficients or target topology parameters.

2 Likes

Voted to adopt with nothing to object.

Voted to adopt proposal 133149. Replaces a degraded node without changing the Nakamoto Coefficient.

DFINITY will submit an NNS proposal today to reduce the notarization delay on the Internet Identity subnet, uzr34, similar to what has happened on other subnets in recent weeks (you can find all details in this forum thread).

After the successful rollout of the Application subnets, we propose a gradual rollout for the System subnets, starting with the Internet Identity subnet.

4 Likes

Thanks for this announcement @dsharifi, it seems to have aligned perfectly with my lunch break :yum:

Are you able to elaborate on the choice of subnet? The SNS subnet is technically an application subnet (albeit a special one). It has 34 nodes. I’d expect this to be a safer choice for starting the production changes on the large subnets (or maybe the fiduciary subnet). If something unexpected goes wrong in production, the II subnet would probably have one of the highest blast radiuses wouldn’t it?

At the moment this change has only been deployed to 13 node subnets (all of them).

2 Likes

Voted to adopt proposal 133307. The subnet id and the delay are correct. I don’t think that having larger number of nodes per subnet would be any issue with this, there are other limits in place that protect the subnet.

1 Like

Larger subnets take longer to disseminate artifacts. That’s why this subnet currently has a delay of 1000ms instead of 600ms.

@dsharifi is setting this subnet to 300ms intentional? Could you elaborate? Perhaps this relates to the optimisation in last week’s IC OS proposal making the delay adaptive based on network conditions?

2 Likes

You mean the dynamic delay ? Also would you have been more comfortable with a reduction to only 600 ? I guess that we will have to wait for official answer.

1 Like

Also I see from using the ic-admin tool that the current notarisation delay is 1000ms, not 600ms as the proposal says. I presume this might have been a typo but is there a case for lowering it to 600ms first? Is there some testing to help guide this decision?

@dsharifi @LaCosta

1 Like

Hi,

Indeed, the current notarization delay is 1000ms on the Internet Identity subnet and all other System subnets. I forgot to change and update our proposal template summary description to 1000ms when submitting the proposal. So yes, the 600ms in the summary is an error/typo.

Yes, we have done extensive benchmarks with testnets with 40 and 31 nodes that indicate it is safe to lower the notarization delay down to 300ms. In the benchmarks we also simulate RTT, packet loss, and bandwidth to mimic the production topology based on our metrics. The benchmarks are the same ones we did with the Application testnets. The benchmarks involve:

  • Stress testing the subnet with a high load of Ingress messages, filling every execution round.
  • Kill nodes on the subnet for an extended duration, then restart the node to verify that it can state sync, catch up, and participate in block-making
4 Likes

Thanks for clearing this.

Thanks for the extra explanation @dsharifi. I’m afraid I have to reject the proposal as it fundamentally misrepresents itself to voters, which needs to be a no no (even if it’s by accident) to avoid building up potentially dangerous precedent (and promoting bad voting culture).

Regarding the change itself (aside from the wording), could you clarify why the same delay is being used for subnets that are more than twice the size of smaller subnets using that delay? My understanding is that finalisation rates are dependent on the size of the subnet. Are you planning to reduce the delay for 13 node subnets even further?

Could you also comment on the choice of subnet? Is the II subnet considered lower risk than the SNS or Fiduciary subnets? This is the first time such a change is being rolled out to a large subnet (and the magnitude of the difference compared to the existing configuration is significantly greater).

2 Likes

I’ve voted to reject proposal 133307. As clarified by @dsharifi above (and thank you for explaining!) the current notarisation delay was mistakenly given as 1000ms in the proposal instead of 600ms. This is a key detail as it means that the magnitude of the change is greater than what voters may be given to understand. The description of the testing is very reassuring. However, from looking through recent Subnet Management proposals I’ve noticed that a number of nodes have had a sharp decrease in performance and have been listed for removal (from a subnet) following the decrease in notarisation delay for their respective subnets. Is this a valid concern and perhaps a greater risk for the larger subnets? I’m leaning towards favouring a smaller decrease at first, perhaps to 600ms, but I’m very open to being persuaded otherwise.

2 Likes

Voted to reject proposal 133307. Thanks for the thorough explanation on the proposal but I have to agree that even if it is just a simple mistake regarding the previous notarization delay mentioned in the proposal, it still might mislead people that might have different opinions otherwise as @timk11 and @Lorimer had. Also I think that providing more information specially when scaling this proposals to bigger and more relevant subnets should be done initially and I also would like to hear more on how the stress testing on this subnets work, for example do you simulate the distances between nodes that provide a similar behavior with the targeted subnet? Is there a way to verify the performance of those testnets?

2 Likes

Thanks for the response. I agree. This proposal should be rejected and re-submitted to not set a precedent where we vote on proposals that have misleading summaries, intentional or not.

Before answering your questions, I think it’s worth giving some context on what the notarization delay is, why it is needed, and how we choose a correct value for it.

What is a notarization delay?

When a node receives a block proposal, it verifies the block content and waits for a short duration before notarizing (signing) the block and sending its notarization to all its peers in the subnet. The time a node waits before sending the notarization share is what we call the notarization delay (ND).

For a block to be finalized, a block maker needs to send a block proposal to its peers nodes in the subnet, and at least 2/3 * subnet_size + 1 of the nodes must notarize the message and send its notarization share to the other nodes.

The ND serves as a throttle for the maximum finalization rate of a subnet. I.e. a block can never be finalized faster than the ND, as the ND is the minimum time it takes for nodes to share a notarization of a block when they receive a block proposal. So the finalization rate in a subnet will always be less than 1 block / ND, in this case, more specifically 1 block / 0.3s => 3.33 block/s.

In practice, the block rate is lower than the theoretical maximum, as there are overheads in processing a block, the execution time of canister calls, networking latencies between peers, etc.

Why do we have a notarization delay?

A node can fall behind the rest of the nodes in a subnet, for example, due to networking issues, crashing and restarting, or newly joining a subnet. When a node is behind it needs to catch up with the rest of the subnet to participate in the consensus (block proposals) of the subnet. For a node to catch up, the node typically downloads the state of the subnet at some checkpoint block height and must replay the missing blocks at a higher rate than the subnet finalizing blocks. However, if the node that is behind is unable to replay blocks at a higher rate than the rest of the subnet is finalizing blocks, then the node that is behind will never be able to catch up, as it will always lag behind the rest of the subnet.

To make it possible for nodes to catch up if they are behind, we use the ND as a throttle for the rest of the subnet to slow down the finalization rate, such that the node that is behind can replay blocks at a faster rate than the subnet is finalizing blocks.

How much we throttle a subnet’s finalization rate by (the value of the ND), is independent of the subnet size. The ND is there to ensure that all nodes can participate in consensus and block making of a subnet, and that if a node that falls behind can replay blocks at a faster rate than the subnet finalizes new ones.

What factors do we need to consider for the notarisation delay?

Available networking bandwidth on nodes

A node that is behind needs to be able to download old blocks and replay them at a faster rate than the rest of the subnet. This means that the catching up node replays blocks at a rate of:
block_download_time + block_processing_time < ND + block_processing_time.

block_download_time < ND

In the worst case scenario, where we have full blocks, each block has a max size of 4MB. A node must also have a minimum of 300 Mb/s available bandwidth to meet the IC spec. Assume 200Mb/s is available for consensus.

block_size / bandwidth < ND

4MB / 200Mb/s < ND

32Mb / 200Mb/s < ND

0.16s < ND

Thus, from a networking perspective, we need a notarisation delay of at least 160ms in order for a node to catch up in a scenario where we are replaying blocks in a subnet that is under full load.

Overhead per block.

There is also an unavoidable overhead of processing blocks and executing them once they are finalized. Once a block with messages is notarized, the node will execute the messages in that block and certify it in parallel to making new blocks.The high level idea is that if the execution and certification steps take too long, then the bottleneck for the block rate will be determined by execution and certification.

This is something we have observed in busier subnets, such as the OpenChat subnets, meaning we need a notarisation delay, or a block rate throttler that ensure the nodes that are ahead spend longer time making and notarizing blocks than it takes to execute and certify the blocks for the nodes that are behind.

From the production metrics we have about these overheads, we have deduced that a 300ms ND is enough.

What testing and experiments have been conducted?

For all of our experiments, we have simulated Round Trip Times (RTT), bandwidth, and packet loss, to simulate the network behavior of nodes that we see on the Internet Identity subnet. Checkout simulate_network.rs in the IC repo for the source code on how we set these simulations with the transmission control (tc) linux utility.

We have mainly stress tested the simulated subnet by flooding it with a large number of ingress messages, installing many canisters to increase load, and killing nodes to verify that nodes can catch up when joining the subnet.

Our tests show that the change is perfectly fine regardless of the subnet type and subnet size.

6 Likes

We will re-submit a new proposal with an updated summary to lower the notarization delay to 300ms.

4 Likes

Thanks @dsharifi for this very helpful explanation. Is this material also in a blog post or elsewhere in the online resources? If not, I think it would be well worth adding.

2 Likes

I’ve voted to adopt proposal 133315 based on this explanation. This proposal reduces the notarisation delay for subnet uzr34 with the aim of reducing network latency.

2 Likes

Thanks for the thorough explanation on this topic. I have voted to adopt proposal 133315 that reduces the notarization delay of the uzr34 subnet (Internet Identity subnet) from 1000ms to 300ms.

1 Like