Subnet Management - io67a (Application)

Lorimer · August 22, 2024, 7:18pm

This topic is intended to capture Subnet Management activities over time for the io67a subnet, providing a place to ask questions and make observations about the management of this subnet.

At the time of creating this topic the current subnet configuration is as follows:

Expand

{
  "version": 44805,
  "records": [
    {
      "key": "subnet_record_io67a-2jmkw-zup3h-snbwi-g6a5n-rm5dn-b6png-lvdpl-nqnto-yih6l-gqe",
      "version": 44805,
      "value": {
        "membership": [
          "2ajl7-n6xum-2j7km-24scw-5tirj-whjlx-756pq-jbcua-utl3q-mts5r-hae",
          "2xph2-xvn4v-pc5xz-3upfj-etpio-qaf2k-d2s3v-jnceu-vcher-qhqav-rqe",
          "cp7d4-uiulo-a46se-yzti4-6qpmz-4rhug-d6rln-7aa5r-nc63x-6q7u7-aae",
          "plbgg-2silg-344nm-n7zqv-lpy4j-uktsq-darh7-shrma-srui4-xrcgp-dae",
          "q3w37-sdo2u-z72qf-hpesy-rgqes-lzflk-aescx-c5ivv-qdbty-s6pgc-jae",
          "tkdjq-rhpgw-7ubus-5w3nx-jrsdi-w4q65-e457d-yt2pt-hsz22-yehmk-uae",
          "xsa4m-ko3b7-2zvcf-56q55-ahyqi-2qkho-obuu3-l4otg-4z6fn-n2r6i-cae",
          "q2ucv-x7dv5-hheao-ocsye-jbg4z-enm75-ss62d-ehqhj-zwwm3-cap5q-tqe",
          "c37f7-3shrz-2mk6g-e2zvp-3z6xc-uuk6r-6yegs-53uzt-c3vtf-dacsv-tae",
          "j63cj-up2z6-xh7m5-m2t5m-t4xi6-6tqz2-ibj4a-mmm2s-7o2bv-ynnc6-xae",
          "ahekl-ihkqy-u4jqo-5kjvc-ygeuv-ejmin-25kld-7vyop-d6unr-uhlrv-wqe",
          "eexw3-cwz3z-6antd-yoyyc-radeu-jmt66-7p2r7-kjbkn-cij4b-kwqrb-tqe",
          "7pwy2-67vvv-agagj-elfj4-gtef2-jxddy-ywxtt-74afr-aeuil-zecky-iae"
        ],
        "nodes": {},
        "max_ingress_bytes_per_message": 2097152,
        "max_ingress_messages_per_block": 1000,
        "max_block_payload_size": 4194304,
        "unit_delay_millis": 1000,
        "initial_notary_delay_millis": 600,
        "replica_version_id": "6968299131311c836917f0d16d0b1b963526c9b1",
        "dkg_interval_length": 499,
        "start_as_nns": false,
        "subnet_type": "verified_application",
        "features": {
          "canister_sandboxing": false,
          "http_requests": true,
          "sev_enabled": false
        },
        "max_number_of_canisters": 120000,
        "ssh_readonly_access": [],
        "ssh_backup_access": [],
        "ecdsa_config": null,
        "chain_key_config": null
      }
    }
  ]
}

Lorimer · August 22, 2024, 9:06pm

There’s currently an open ‘Update Subnet Config’ Subnet Management proposal for this subnet → Proposal: 132123 - ICP Dashboard (internetcomputer.org)

Proposal Summary:

This proposal sets the notarisation delay of the subnet to 300ms, down from 600ms. The change will increase the block rate of the subnet, aimed to reduce latency of update calls.

600ms (the current configuration) is the default configuration for application subnets.
At the moment there are only two configurations used for initial_notary_delay_millis.

1000ms: used by four critical subnets (large subnets) - tdb26 (nns), pzp6e (fiduciary), uzr34 (II), x33ed (SNS)
600ms: used by all other subnets (13 nodes subnets)

Larger subnets naturally require larger delays. However @dsharifi announced earlier today plans to increase the block rate, thanks to numerous performance enhancements that have been implemented.

@dsharifi are you able to provide some information about why this subnet was chosen to roll out this new config? Presumably eventually all 13 node subnets will use this configuration? What about the larger subnets, will you be aiming to keep the same relative proportions by halving the notary delay to 500ms (or lower)? (I’m only asking because I’m aware that there’s currently a drive to minimise configuration differences between NNS and other subnets).

Thanks in advance

Lerak · August 23, 2024, 7:59am

@Lorimer 300ms is too low for nodes on the other side of the world.

This node is in sc1 data centre, an hour north of Brisbane in Australia. I have been monitoring ping times from a server on AWS in Frankfurt to a node on the io67a subnet q3w37-sdo2u-z72qf-hpesy-rgqes-lzflk-aescx-c5ivv-qdbty-s6pgc-jae :

400ms would be a better number, and a more relavant test, if we want to move all application subnets to a lower number in future.

Lorimer · August 23, 2024, 7:15pm

Thanks @Lerak. Nice idea to get an indication of network latency. It would be interesting to hear more from the networking team about how representative they consider this to be (@dsharifi).

My understanding is that the initial_notary_delay_millis is the minimum amount of time that all nodes need to wait to notarize (but they can wait longer, with notarisation falling back to other ranked blocks). Setting this somewhere close to the network latency that can be reasonably expected under ideal conditions may be intentional (I’m not sure). If this is set too low for all nodes to be able to disseminate artifacts in time, my understanding is that it’s more likely that multiple notarised blocks at the same height will occur (but the subnet will still be able to make progress, albeit on multiple chains that will need pruning by the finalisation process).

@Manu, are you able to provide some insight into the trade-offs involved in setting the initial_notary_delay_millis for optimal subnet performance? (and worst and best case scenarios that can be expected)

Lorimer · August 24, 2024, 7:29am

I’ve had a few more thoughts after sleeping on this.

dsharifi · August 26, 2024, 7:25am

Hi @Lerak,

Given that artifacts are pushed with the new P2P implementation, a node only needs 1/2 RTT for its notarization share to reach a peer.

We have done extensive performance tests, where we have simulated the network topology of the io67a subnet by simulating RTTs, packet loss, and bandwidth to mimic production settings.

Of course, if we see any regressions we will propose adjusting the delays again.

Manu · August 26, 2024, 7:55am

Hi @Lerak! I guess you’re thinking about the risk of nodes with higher ping being unable to propose blocks. So in addition to what @dsharifi said, note that this initial notary delay is only the start of when notaries start signing the block from the highest priority block maker. Only after initial_notary_delay + unit_delay (which would be 300 ms initial_notary_delay if this proposal passes + 1000ms unit delay) do notaries fall back to supporting blocks from lower priority block makers.

We will keep an eye on the block making metrics to ensure that no problems are introduced.

dsharifi · August 26, 2024, 8:00am

Hi @Lorimer,

Thanks for creating this new thread specifically for the io67a proposal!

Yes, the goal is indeed for all 13 node subnets to be configured to have an initial notary delay of 300ms.

We don’t have a specific value in place yet for the larger subnets. We are still doing extensive benchmarks to ensure that we find a safe value such that all nodes can make block proposals, and keep up in larger subnets. As we are doing a gradual rollout, we will not propose to adjust the larger subnets until we have adjusted all 13 node Application subnets with metrics to back up that everything works as expected.

Lorimer · August 26, 2024, 8:43am

Thanks @dsharifi, sounds great I’ve voted ‘yes’

Lerak · August 26, 2024, 1:20pm

@Manu Yes - I am concerned that it can affect the node’s trustworthy metrics stats. Thus when this node is chosen to be the blockmaker - will there now be an increased probability of timing out?

Manu · August 26, 2024, 1:35pm

Theoretically yes, right now the node has to propose its block in 1600 ms, and with this proposal it reduced to 1300 ms, so it’s not a huge reduction. We will keep an eye on the trustworthy node metrics to see if we see any noticeable differences, and please let us know if you notice anything.

timk11 · November 9, 2024, 5:20am

Voted to adopt proposal 134040, as the reasoning is sound and the proposal description matches the payload.

This proposal is intended to replace a dead node, ahekl, which node appears as “Status: Offline” on the dashboard. As seen in the proposal (which I verified using the DRE tool), the proposed change leaves the target topology within the requirements and improves decentralisation with respect to country.

Lorimer · November 9, 2024, 10:44pm

Proposal 134040

TLDR: This proposal replaces an offline node in the US with an unassigned node in Latvia. I’ve adopted.

This is a great proposal. The subnet is currently violating the IC Target Topology, with 3 nodes in the same country (when the limit is 2). After this proposal executes there will be no more than 2 nodes in the same country.

Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) →

	Smallest Distance	Average Distance	Largest Distance
EXISTING	304.223 km	8920.217 km	16748.078 km
PROPOSED	304.223 km	8500.17 km (-4.7%)	16748.078 km

This proposal slightly reduces decentralisation, considered purely in terms of geographic distance (and therefore there’s a slight theoretical reduction in localised disaster resilience).

Subnet characteristic counts →

	Continents	Countries	Data Centers	Owners	Node Providers	Node Operator
EXISTING	5	11	13	13	13	13
PROPOSED	5	12 (+8.3%)	13	13	13	13

This proposal slightly improves decentralisation in terms of jurisdiction diversity.

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) →

	Continent	Country	Data Center	Owner	Node Provider	Node Operator
EXISTING	4	3	1	1	1	1
PROPOSED	4	2 (-33.33%)	1	1	1	1

See here for acceptable limits → Motion 132136

The above subnet information is illustrated below, followed by a node reference table:

Map Description

Red marker represents a removed node (transparent center for overlap visibility)
Green marker represents an added node
Blue marker represents an unchanged node
Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table

	Continent	Country	Data Center	Owner	Node Provider	Node Operator	Node	Status
`---`	~~Americas~~	~~United States of America (the)~~	~~Jacksonville (jv1)~~	~~Tierpoint~~	~~9Yards Capital~~	~~wmrev~~	~~ahekl-ihkqy-u4jqo-5kjvc-ygeuv-ejmin-25kld-7vyop-d6unr-uhlrv-wqe~~	~~DOWN~~
`+++`	Europe	Latvia	Riga 1 (rg1)	DEAC	Vladyslav Popov	7mdax	poyg5-cbmcm-a372x-j72lt-zm4rz-4sf5v-ajtle-hyqkt-rdgaz-eqzmi-5qe	UNASSIGNED
	Oceania	Australia	Queensland 1 (sc1)	NEXTDC	Karel Frank	f3toa	q3w37-sdo2u-z72qf-hpesy-rgqes-lzflk-aescx-c5ivv-qdbty-s6pgc-jae	UP
	Europe	Belgium	Brussels (br1)	Digital Realty	Allusion	mjeqs	eexw3-cwz3z-6antd-yoyyc-radeu-jmt66-7p2r7-kjbkn-cij4b-kwqrb-tqe	UP
	Americas	Canada	Vancouver (bc1)	Cyxtera	Blockchain Development Labs	feb2q	xsa4m-ko3b7-2zvcf-56q55-ahyqi-2qkho-obuu3-l4otg-4z6fn-n2r6i-cae	UP
	Europe	Switzerland	Zurich 4 (zh4)	Nine.Ch	Tomahawk.vc	paxme	j63cj-up2z6-xh7m5-m2t5m-t4xi6-6tqz2-ibj4a-mmm2s-7o2bv-ynnc6-xae	UP
	Asia	China	HongKong 4 (hk4)	hkntt	Web3game	dg7of	tkdjq-rhpgw-7ubus-5w3nx-jrsdi-w4q65-e457d-yt2pt-hsz22-yehmk-uae	UP
	Asia	Georgia	Tbilisi 1 (tb1)	Cloud9	George Bassadone	yhfy4	cp7d4-uiulo-a46se-yzti4-6qpmz-4rhug-d6rln-7aa5r-nc63x-6q7u7-aae	UP
	Asia	Japan	Tokyo 2 (ty2)	Equinix	Starbase	dpt4y	7pwy2-67vvv-agagj-elfj4-gtef2-jxddy-ywxtt-74afr-aeuil-zecky-iae	UP
	Asia	Singapore	Singapore 3 (sg3)	Racks Central	OneSixtyTwo Digital Capital	5mhxl	2ajl7-n6xum-2j7km-24scw-5tirj-whjlx-756pq-jbcua-utl3q-mts5r-hae	UP
	Europe	Slovenia	Ljubljana 2 (lj2)	Anonstake	Anonstake	eu5wc	c37f7-3shrz-2mk6g-e2zvp-3z6xc-uuk6r-6yegs-53uzt-c3vtf-dacsv-tae	UP
	Americas	United States of America (the)	Boston (bo1)	INAP	DFINITY USA Research LLC	ut325	q2ucv-x7dv5-hheao-ocsye-jbg4z-enm75-ss62d-ehqhj-zwwm3-cap5q-tqe	UP
	Americas	United States of America (the)	Panama City 1 (pc1)	Navegalo	Bianca-Martina Rohner	qaes5	2xph2-xvn4v-pc5xz-3upfj-etpio-qaf2k-d2s3v-jnceu-vcher-qhqav-rqe	UP
	Africa	South Africa	Cape Town 1 (ct1)	Africa Data Centres	Illusions In Art (Pty) Ltd	2aemz	plbgg-2silg-344nm-n7zqv-lpy4j-uktsq-darh7-shrma-srui4-xrcgp-dae	UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Another good neuron to follow is Synapse (follows the LORIMER and CodeGov known neurons for Subnet Management, and is a generally well informed known neuron to follow on numerous other topics)

LaCosta · November 11, 2024, 11:17am

Voted to adopt proposal 134040. The proposal replaces one nodes from subnet io67a:
Removed Node: ahekl, Dashboard Status Offline
Added Node: poyg5.
The proposal was verified using the DRE tool to verify the metrics stated and on top of replacing the dead node, the replacement improves decentralization on the country metric, reducing the number of US nodes from 2 to 1.

ZackDS · November 11, 2024, 11:49am

Voted to adopt proposal #134040.

The proposal is correct in that it replaces offline dead node from Jacksonville, US ahekl , with Riga1, LV node poyg5 while improving decentralization.

ZanderG · December 2, 2024, 9:07am

Hello @Manu,
When you made this reply, we were somewhat sceptical about the impact of reducing the notarization delay on the blockmaker failure rate. Over the past three months, we monitored the blockmaker failure rates for the two subnets (eq6en and w4rem) with the highest median failure rates that experienced a reduction in notarization delay from 600ms to 300ms. Below are the results.

Please note that the transparent lines in the graph represent the actual data points, while the solid lines show the 7-day moving averages. From the graph, it appears that there has been a drastic increase in the block failure rate.
These findings lead me to ask you and @dsharifi the following questions:

Each time a blockmaker fails to produce a block and a new blockmaker is chosen, doesn’t this overall increase the network latency (considering the time taken by the first blockmaker to fail and the time taken by the new blockmaker to create the block)?
Is there a specific threshold where Dfinity would consider the failure rate unacceptably high?

Manu · December 2, 2024, 9:33am

Yes, block makers missing their opportunity to create a block and having to fall back adds latency. However, always having a slower block rate also increases latency. In my view, we should optimize for latency, because that’s what users actually notice, and we do see that latency has gone down with the increased block rate

ZanderG · December 2, 2024, 11:13am

I agree, @Manu. I just wanted to bring this to your attention. Thank you.

timk11 · December 7, 2024, 3:58am

Voted to adopt proposal 134405.

This proposal replaces node 2xph2 which appears in the dashboard as “Status: Active”, for the purpose of making it available to other subnets in order to improve overall network topology. As shown in the proposal and verified using the DRE tool, decentralisation parameters are unchanged and remain within the requirements of the target topology.

LaCosta · December 8, 2024, 4:28pm

Proposal 134405

Vote: REJECT

I have raised question regarding this kind of proposals previously and adopted them firstly expecting better explanation in the future as can be seen in this two reviews, 134191 and 134192.

Topic		Replies	Views
Subnet Management - 2fq7c (Application) NNS proposal discussions nns , Governance , Subnet-management	62	561	June 27, 2025
Subnet Management - o3ow2 (Application) NNS proposal discussions nns , Governance , Subnet-management	55	418	June 27, 2025
Subnet Management - nl6hn (Application) NNS proposal discussions nns , Governance , Subnet-management	35	293	May 26, 2025
Subnet Management - lhg73 (Application) NNS proposal discussions nns , Governance , Subnet-management	91	776	February 24, 2025
Subnet Management - 4ecnw (Application) NNS proposal discussions nns , Governance , Subnet-management	20	138	February 24, 2025