Subnet Management - 6pbhf (Application)

This topic is intended to capture Subnet Management activities over time for the 6pbhf subnet, providing a place to ask questions and make observations about the management of this subnet.

At the time of creating this topic the current subnet configuration is as follows:

Expand
{
  "version": 44309,
  "records": [
    {
      "key": "subnet_record_6pbhf-qzpdk-kuqbr-pklfa-5ehhf-jfjps-zsj6q-57nrl-kzhpd-mu7hc-vae",
      "version": 44309,
      "value": {
        "membership": [
          "l2bzb-vpnu6-jlsae-poddg-5hpnz-bx54x-3khc2-aazpn-bop4m-zforf-uqe",
          "rzm7j-37ied-ynlvt-zma7n-r4bjg-wxg2d-kr4me-ss4yr-3p5se-n4sry-wqe",
          "r4dbq-jplty-hxb2n-yx4pt-dm2vf-m73ro-sqgq7-onnzn-zenvp-dleaz-mae",
          "jj7si-b7rs7-nq7sv-npame-aec3y-2anip-ol5s6-gxbh7-kjbg2-a5u6j-rae",
          "rfkza-27bii-6jan4-u4zll-lkvmz-snmao-irmlj-arpdd-kyxrg-xnq3a-7ae",
          "tgmtp-wy3f4-hqron-bnvc3-scclx-b7fgg-gdnc2-dwvks-dn6ao-gbqso-5ae",
          "efdju-ef2ce-a5jdn-obybl-x6ema-h5lwv-nc2sy-v4hvc-7nltm-aldtv-6ae",
          "7v72g-sof5q-riabw-dzefk-7p74b-wxwzs-dgvbv-rlrxx-2jpjy-zli4s-cqe",
          "w53hu-bdzuz-h7h75-weodb-getvj-rr766-m2rtb-bigdq-l62cj-7atxw-2ae",
          "ogokl-oqium-3p2bk-f3hpo-dr67s-oilge-k4jq5-z5poz-2b2oq-4wxg5-aae",
          "4mm7j-dmeng-7ib4h-yvt3c-g5ed6-i5yar-rrc6w-rp7tk-ej3by-gilvv-kae",
          "vuizy-nfm5v-rapnc-rijer-hijfx-bvjrz-ccxdd-kte3v-awbnq-bdm6m-6qe",
          "l4mrq-cmo2o-ydidi-v2zit-pemyc-itm4j-qw2u3-kwzso-yz5dv-geium-pqe"
        ],
        "nodes": {},
        "max_ingress_bytes_per_message": 2097152,
        "max_ingress_messages_per_block": 1000,
        "max_block_payload_size": 4194304,
        "unit_delay_millis": 1000,
        "initial_notary_delay_millis": 600,
        "replica_version_id": "a3831c87440df4821b435050c8a8fcb3745d86f6",
        "dkg_interval_length": 499,
        "start_as_nns": false,
        "subnet_type": "application",
        "features": {
          "canister_sandboxing": false,
          "http_requests": true,
          "sev_enabled": false
        },
        "max_number_of_canisters": 120000,
        "ssh_readonly_access": [],
        "ssh_backup_access": [],
        "ecdsa_config": null,
        "chain_key_config": null
      }
    }
  ]
}

Thereā€™s an open proposal for changing subnet membership - https://dashboard.internetcomputer.org/proposal/131411. This information is presented below:

  • red marker represents a removed node
  • green marker represents an added node
  • highlighted patches represent the country a node sits within

Table
Country Data Center Owner Node Provider Node
--- Germany Munich (mu1) q.beyond Staking Facilities ogokl-oqium-3p2bk-f3hpo-dr67s-oilge-k4jq5-z5poz-2b2oq-4wxg5-aae
+++ Belgium Antwerp (an1) Datacenter United Allusion mehhn-b5swd-urm2r-cltwk-c5tns-2s2bf-skz74-hv63o-3qdad-ubbfn-iae
Australia Queensland 1 (sc1) NEXTDC ANYPOINT PTY LTD r4dbq-jplty-hxb2n-yx4pt-dm2vf-m73ro-sqgq7-onnzn-zenvp-dleaz-mae
Switzerland Geneva 2 (ge2) SafeHost Archery Blockchain SCSp 4mm7j-dmeng-7ib4h-yvt3c-g5ed6-i5yar-rrc6w-rp7tk-ej3by-gilvv-kae
Spain Madrid 1 (ma1) Ginernet Artem Horodyskyi jj7si-b7rs7-nq7sv-npame-aec3y-2anip-ol5s6-gxbh7-kjbg2-a5u6j-rae
Croatia Zagreb 1 (zg1) Anonstake Anonstake rzm7j-37ied-ynlvt-zma7n-r4bjg-wxg2d-kr4me-ss4yr-3p5se-n4sry-wqe
Israel Tel Aviv 1 (tv1) Interhost GeoNodes LLC rfkza-27bii-6jan4-u4zll-lkvmz-snmao-irmlj-arpdd-kyxrg-xnq3a-7ae
Japan Tokyo 2 (ty2) Equinix Starbase w53hu-bdzuz-h7h75-weodb-getvj-rr766-m2rtb-bigdq-l62cj-7atxw-2ae
Korea (the Republic of) Seoul 2 (kr2) Gasan Web3game l2bzb-vpnu6-jlsae-poddg-5hpnz-bx54x-3khc2-aazpn-bop4m-zforf-uqe
Romania Bucharest (bu1) M247 Iancu Aurel vuizy-nfm5v-rapnc-rijer-hijfx-bvjrz-ccxdd-kte3v-awbnq-bdm6m-6qe
Singapore Singapore (sg1) Telin OneSixtyTwo Digital Capital l4mrq-cmo2o-ydidi-v2zit-pemyc-itm4j-qw2u3-kwzso-yz5dv-geium-pqe
Slovenia Maribor (mb1) Posita.si Fractal Labs AG efdju-ef2ce-a5jdn-obybl-x6ema-h5lwv-nc2sy-v4hvc-7nltm-aldtv-6ae
Sweden Stockholm 1 (sh1) Digital Realty DFINITY Operations SA tgmtp-wy3f4-hqron-bnvc3-scclx-b7fgg-gdnc2-dwvks-dn6ao-gbqso-5ae
United States of America (the) San Jose (sj1) INAP Shelburne Ventures, LLC 7v72g-sof5q-riabw-dzefk-7p74b-wxwzs-dgvbv-rlrxx-2jpjy-zli4s-cqe

The removed node is replaced with a node based in Belgium. Iā€™ve verified that this node is currently unassigned.

The proposal mentions that his change is needed due to the Munich (mu1) data centre being due for decommissioning. See here for more discussion and references.

1 Like

Proposal 132145

TLDR: Offline node in Israel replaced with one in Georgia. This looks good, however a few points Iā€™d like some clarity on before voting:

  • Iā€™ve noticed that the unassigned nodes are currently on GuestOS version 3d0b3f10417fc6708e8b5d844a0bac5e86f3e17d while the subnet is running GuestOS version 6968299131311c836917f0d16d0b1b963526c9b1. Iā€™m unclear how this is handled. Is the GuestOS version automatically updated for the unassigned node as part of joining the subnet? If so, whatā€™s the point of deploying GuestOS versions to unassigned nodes in the first place (e.g. Proposal: 131712)?
  • Iā€™m aware of cases where other types of proposals have failed due to the GuestOS version on unassigned nodes, such as when unelecting versions from the registry (e.g. below)

@Luka do you know if GuestOS version inconsistencies can be an issue during ā€˜Change Subnet Membershipā€™ proposals (or is the unassigned nodeā€™s GuestOS version updated automatically to reflect the subnet)?

Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) ā†’

Smallest Distance Average Distance Largest Distance
EXISTING 117.442 km 6831.051 km 17277.995 km
PROPOSED 117.442 km 6777.392 km (-0.8%) 17277.995 km

This proposal slightly reduces decentralisation, considered purely in terms of geographic distance (and therefore thereā€™s a slight theoretical reduction in localised disaster resilience). :-1:

Subnet characteristic counts ā†’

Continents Countries Data Centers Owners Node Providers
EXISTING 4 13 13 13 13
PROPOSED 4 13 13 13 13

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) ā†’

Continent Country Data Center Owner Node Provider
EXISTING 7 1 1 1 1
PROPOSED 7 1 1 1 1

See here for acceptable limits ā†’ Motion 125549 (note that these are due for a slight revision)

The above subnet information is illustrated below, followed by a node reference table:

Map Description
  • Red marker represents a removed node (transparent center for overlap visibility)
  • Green marker represents an added node
  • Blue marker represents an unchanged node
  • Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
  • Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table
Continent Country Data Center Owner Node Provider Node Status
--- Asia Israel Tel Aviv 1 (tv1) Interhost GeoNodes LLC rfkza-27bii-6jan4-u4zll-lkvmz-snmao-irmlj-arpdd-kyxrg-xnq3a-7ae DOWN
+++ Asia Georgia Tbilisi 1 (tb1) Cloud9 George Bassadone xqhoe-c5pck-wcytx-owvyr-mntbp-u22ct-nvcqi-bri5w-sjow2-rs4gp-sae UNASSIGNED
Oceania Australia Queensland 1 (sc1) NEXTDC ANYPOINT PTY LTD r4dbq-jplty-hxb2n-yx4pt-dm2vf-m73ro-sqgq7-onnzn-zenvp-dleaz-mae UP
Europe Belgium Antwerp (an1) Datacenter United Allusion mehhn-b5swd-urm2r-cltwk-c5tns-2s2bf-skz74-hv63o-3qdad-ubbfn-iae UP
Europe Switzerland Geneva 2 (ge2) SafeHost Archery Blockchain SCSp 4mm7j-dmeng-7ib4h-yvt3c-g5ed6-i5yar-rrc6w-rp7tk-ej3by-gilvv-kae UP
Europe Spain Madrid 1 (ma1) Ginernet Artem Horodyskyi jj7si-b7rs7-nq7sv-npame-aec3y-2anip-ol5s6-gxbh7-kjbg2-a5u6j-rae UP
Europe Croatia Zagreb 1 (zg1) Anonstake Anonstake rzm7j-37ied-ynlvt-zma7n-r4bjg-wxg2d-kr4me-ss4yr-3p5se-n4sry-wqe UP
Asia Japan Tokyo 2 (ty2) Equinix Starbase w53hu-bdzuz-h7h75-weodb-getvj-rr766-m2rtb-bigdq-l62cj-7atxw-2ae UP
Asia Korea (the Republic of) Seoul 2 (kr2) Gasan Web3game l2bzb-vpnu6-jlsae-poddg-5hpnz-bx54x-3khc2-aazpn-bop4m-zforf-uqe UP
Europe Romania Bucharest (bu1) M247 Iancu Aurel vuizy-nfm5v-rapnc-rijer-hijfx-bvjrz-ccxdd-kte3v-awbnq-bdm6m-6qe UP
Asia Singapore Singapore (sg1) Telin OneSixtyTwo Digital Capital l4mrq-cmo2o-ydidi-v2zit-pemyc-itm4j-qw2u3-kwzso-yz5dv-geium-pqe UP
Europe Slovenia Maribor (mb1) Posita.si Fractal Labs AG efdju-ef2ce-a5jdn-obybl-x6ema-h5lwv-nc2sy-v4hvc-7nltm-aldtv-6ae UP
Europe Sweden Stockholm 1 (sh1) Digital Realty DFINITY Operations SA tgmtp-wy3f4-hqron-bnvc3-scclx-b7fgg-gdnc2-dwvks-dn6ao-gbqso-5ae UP
Americas United States of America (the) San Jose (sj1) INAP Shelburne Ventures, LLC 7v72g-sof5q-riabw-dzefk-7p74b-wxwzs-dgvbv-rlrxx-2jpjy-zli4s-cqe UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Other good neurons to follow:

  • CodeGov (will soon be committed to actively reviewing and voting on Subnet Management proposals based on those reviews)

  • WaterNeuron (the WaterNeuron DAO frequently discuss proposals like this in order to vote responsibly based on DAO consensus)

1 Like

we had some issues with this long time ago when unassigned nodes were falling behind nodes in the subnets by a lot of versions. since we started keeping them one version behind nodes in subnets, we hadnā€™t had any issues. and yes, unassigned node is upgraded as soon as it join the subnet.

2 Likes

Thanks @Luka! Good to know the upgrade happens automatically. Iā€™m still a little unclear on why replica versions get explicitly deployed to unassigned nodes.

If I understand, youā€™re saying that the upgrade path from replica version to replica version becomes potentially dangerous if too many intermediary version are skipped between upgrades? Iā€™m unclear why this would be (could you clarify?). Is there a defined number of versions that is regarded as safe to skip?

The protocol changes over time and eventually two versions that are very far apart cannot be upgraded/downgraded between each other. upgrade paths we test are strictly +/-1 version.

1 Like

Thanks @Luka, this info is really helpful. My thinking was that GuestOS images contain the state and logic of the protocol, but of course thereā€™s also the API for interacting with the registry etc. which may have changed between versions. Now I understand why there can of course be breaking changes between versions (unless the +1/-1 version steps are adhered to). Makes sense. Thanks again!

Iā€™m intending on rejecting this proposal and the other similar one, given that neither of them adhere to this rule. They both skip Release: 94fd38099f0e63950eb5d5673b7b9d23780ace2d - ICP Dashboard (internetcomputer.org), due to the unassigned nodes being on:

+/- 1 version is not a hard requirement for unassigned nodes. theyā€™re anyhow upgraded to the version of the subnet before they join the subnet. in any case itā€™s better to try to replace the node even if it potentially fails rather than keeping an already failed node in the subnet.

1 Like

That makes sense, thanks for clarifying. Iā€™ll be accepting this proposal shortly.

Hi @Luka, Iā€™ve been pondering this some more. Iā€™d like to get a better feel for the sorts of things that can go wrong. Are you able to point to some examples?

i think youā€™re mostly right about how this look like with the exception that guestos image does not contain any protocol data.

so i think we had only once the case that there were some too old unassigned nodes that were not able to update once they joined the subnet. i remember the orchestrator broke, most likely because of some registry API incompatibility. it could be that it couldnā€™t read the config of the subnet from the registry for example because some new field was added.

1 Like

Thanks @Luka, do you happen to know the proposal? Iā€™d be interested in learning more about this situation. You mentioned previously that the unassigned nodes are updated to the latest version before joining the subnet (rather than once theyā€™ve joined the subnet, mentioned above). Iā€™m still confused about how a failure can occur unless the old replica version somehow takes on some responsibility during the node swap.

Iā€™m only asking because Iā€™m keen to understand this :grin:

the case iā€™m mentioning was at least two years ago so surely will not be able to find it.

letā€™s take for example that unassigned nodes havenā€™t been updated for some time. for the node to know that it needs to join the subnet, it needs to read latest state from the registry. if it cannot parse the registry because the registry now has fields that cannot be parsed by that older version, it will never learn that it joined the subnet and will never be able to upgrade to subnetā€™s version.

1 Like

Okay thanks @Luka, so just to confirm - the node is responsible for initiating itā€™s own upgrade to the appropriate subnet version (by communicating with the registry while running itā€™s current replica version), rather than the appropriate replica version being pushed to that node at part of a ā€˜Change Subnet Membershipā€™ proposal?

If the node joins the subnet but fails to upgrade the replica version (for the reason youā€™ve described) presumably the proposal would still be considered executed, or is the replica upgrade synchronous and/or somehow propagated to inform the proposal result (executed/failed)?

youā€™re correct. each replica polls the state of the network from registry to determine which version it should run and which subnet it should join. once the node is upgraded, thereā€™s still consensus mechanisms in order for the node to replicate the state from the rest of the subnet.

irt proposals, the success of execution of proposal merely indicates that governance canister was able to update registry successfully.

2 Likes

Proposal 132219

This proposal sets the notarisation delay of the subnet to 300ms, down from 600ms. The change will increase the block rate of the subnet, aimed to reduce latency of update calls.

This is the same subnet config update that was applied to the canary subnet last week, which has held a steady and impressive block rate since. More context is available here ā†’ Reducing End to End latencies on the Internet Computer

Here are the current metrics for this subnet. A question Iā€™ll be asking on the Subnet Management General thread (see reference below this post) is why this update is being rolled out to so many subnets at once, each with different finalisation rate and transaction profiles (e.g. peaks and troughs, whereas the canary subnet was always steady, even prior to the update). Iā€™m wondering if this limits the representativeness of the results on that canary subnet.

I just submitted a proposal to replace an unhealthy node:

4 Likes

@sat This node appears as ā€œStatus: Activeā€ on the dashboard and currently appears as ā€œstatus: UPā€ using the decentralization tool, but appeared as ā€œalert: IC_Networking_StateSyncLoopā€ and ā€œstatus: DEGRADEDā€ when I ran an earlier check at 06:21 UTC. Should we presume that the node has now recovered its health?

3 Likes

This node being ā€œl2bzbā€?
That particular node keeps going degraded and recovering.
You can check that here Node Provider Rewards
Or by fetching and analyzing trustworthy node metrics yourself.

3 Likes

Yes, thatā€™s the one. Sorry - that was a sketchy copy and paste from my draft review. Thanks for clarifying that!

2 Likes