Subnet Management - lhg73 (Application)

This topic is intended to capture Subnet Management activities over time for the lhg73 subnet, providing a place to ask questions and make observations about the management of this subnet.

At the time of creating this topic the current subnet configuration is as follows:

Expand
{
  "version": 44598,
  "records": [
    {
      "key": "subnet_record_lhg73-sax6z-2zank-6oer2-575lz-zgbxx-ptudx-5korm-fy7we-kh4hl-pqe",
      "version": 44598,
      "value": {
        "membership": [
          "cilsw-jxcbi-qvp5o-7cylv-up5nj-2yykt-jtzha-s2uao-ee7uy-nprfm-vae",
          "ihttm-45oz5-an5mg-i2jtb-fayst-s47j6-vmuwr-fqotf-mp2il-n5s5x-cae",
          "bptaj-nejw4-osqqa-zwrej-ysl2o-5ffgj-hkjr6-2w6fi-jczex-vjutw-iae",
          "ixo23-jxvux-ktqca-bje7d-py56s-yvjy5-zpxrk-fmlxt-zhuhg-wu5bc-wqe",
          "bz73b-igxp6-g5dbn-f3ran-p2t5o-b4q42-guoec-imv4z-mvo25-7p4sp-2ae",
          "uj5bp-c66bz-meslf-u4uts-q3njs-tgfnr-bueq6-372lq-yz7so-4fu27-3qe",
          "rsp26-d2hko-kvacs-6mdca-dumka-qxiyw-4yzkp-cuwgr-lj7je-v7e6z-4qe",
          "pdo46-iehoo-x2gfu-t5qu5-y3e64-cdymo-eioop-h6f4a-zebwa-fenb4-xae",
          "ognrk-q4exl-3wf25-yrrsy-mtezk-e3qww-k6s5v-2pikz-gto6z-dyl2y-eae",
          "ddbl6-37efl-b75e4-jpfsb-zioa6-ilvzo-tldwy-fnbhm-nbuoy-66cza-uqe",
          "jfryc-owgdd-a7pp4-lao2c-anza2-nryvi-gqkmu-m2moj-4hzai-zfdiy-4qe",
          "mihvd-umv3j-cjsl2-bfsdu-td7aw-2y6if-aw4fn-cghkm-v2oxd-kj75q-cae",
          "pfmqh-xphm4-h4wkn-nafsx-aix6u-h4gbl-owfhy-wnm4y-vkqyu-nmlhl-aqe"
        ],
        "nodes": {},
        "max_ingress_bytes_per_message": 2097152,
        "max_ingress_messages_per_block": 1000,
        "max_block_payload_size": 4194304,
        "unit_delay_millis": 1000,
        "initial_notary_delay_millis": 600,
        "replica_version_id": "3d0b3f10417fc6708e8b5d844a0bac5e86f3e17d",
        "dkg_interval_length": 499,
        "start_as_nns": false,
        "subnet_type": "application",
        "features": {
          "canister_sandboxing": false,
          "http_requests": true,
          "sev_enabled": false
        },
        "max_number_of_canisters": 120000,
        "ssh_readonly_access": [
          ""
        ],
        "ssh_backup_access": [],
        "ecdsa_config": null,
        "chain_key_config": null
      }
    }
  ]
}
1 Like

There’s an open proposal for changing subnet membership - https://dashboard.internetcomputer.org/proposal/131704. This information is presented below:

  • red marker represents a removed node (transparent center for overlap visibility)
  • green marker represents an added node
  • blue marker represents an unchanged node
  • highlighted patches represent the country a node sits within (red if the country is removed, green if added, otherwise grey)

Table
Country Data Center Owner Node Provider Node Status
--- Costa Rica Bogota 1 (bg1) EdgeUno Geeta Kalwani ihttm-45oz5-an5mg-i2jtb-fayst-s47j6-vmuwr-fqotf-mp2il-n5s5x-cae DEGRADED
--- Estonia Tallinn 2 (ta2) Telia DC Vladyslav Popov bptaj-nejw4-osqqa-zwrej-ysl2o-5ffgj-hkjr6-2w6fi-jczex-vjutw-iae UP
+++ Switzerland Zurich 4 (zh4) Nine.Ch Tomahawk.vc p4khz-nv35h-omz5j-3lflh-f473a-nwumw-yi74i-xwozx-ykk5t-heidc-rqe UNASSIGNED
+++ Germany Munich (mu1) q.beyond Staking Facilities om3xx-z7r5z-22dkp-rfta4-drccc-vlbaq-k7bep-pwemg-vquay-hxc34-7qe UNASSIGNED
Belgium Antwerp (an1) Datacenter United Allusion mihvd-umv3j-cjsl2-bfsdu-td7aw-2y6if-aw4fn-cghkm-v2oxd-kj75q-cae UP
China HongKong 1 (hk1) Unicom Pindar Technology Limited uj5bp-c66bz-meslf-u4uts-q3njs-tgfnr-bueq6-372lq-yz7so-4fu27-3qe UP
Georgia Tbilisi 1 (tb1) Cloud9 George Bassadone ognrk-q4exl-3wf25-yrrsy-mtezk-e3qww-k6s5v-2pikz-gto6z-dyl2y-eae UP
India Navi Mumbai 1 (nm1) Rivram Rivram Inc pdo46-iehoo-x2gfu-t5qu5-y3e64-cdymo-eioop-h6f4a-zebwa-fenb4-xae UP
Japan Tokyo (ty1) Equinix Starbase pfmqh-xphm4-h4wkn-nafsx-aix6u-h4gbl-owfhy-wnm4y-vkqyu-nmlhl-aqe UP
Korea (the Republic of) Seoul 1 (sl1) Megazone Cloud Neptune Partners ixo23-jxvux-ktqca-bje7d-py56s-yvjy5-zpxrk-fmlxt-zhuhg-wu5bc-wqe UP
Netherlands (the) Marseille (mr1) Digital Realty DFINITY Operations SA rsp26-d2hko-kvacs-6mdca-dumka-qxiyw-4yzkp-cuwgr-lj7je-v7e6z-4qe UP
Singapore Singapore 3 (sg3) Racks Central OneSixtyTwo Digital Capital bz73b-igxp6-g5dbn-f3ran-p2t5o-b4q42-guoec-imv4z-mvo25-7p4sp-2ae UP
Slovenia Maribor (mb1) Posita.si Fractal Labs AG jfryc-owgdd-a7pp4-lao2c-anza2-nryvi-gqkmu-m2moj-4hzai-zfdiy-4qe UP
United States of America (the) Chicago 3 (ch3) CyrusOne MI Servers cilsw-jxcbi-qvp5o-7cylv-up5nj-2yykt-jtzha-s2uao-ee7uy-nprfm-vae UP
United States of America (the) Vancouver (bc1) Cyxtera Blockchain Development Labs ddbl6-37efl-b75e4-jpfsb-zioa6-ilvzo-tldwy-fnbhm-nbuoy-66cza-uqe UP

The proposal summary states:

Motivation: replacing 1 unhealthy node; replacing 1 node to improve subnet decentralization

The unhealthy node refers to ihttm-45oz5-an5mg-i2jtb-fayst-s47j6-vmuwr-fqotf-mp2il-n5s5x-cae which is degraded.

The other node removed by this proposal ( bptaj-nejw4-osqqa-zwrej-ysl2o-5ffgj-hkjr6-2w6fi-jczex-vjutw-iae) appears healthy, is located in Estonia, and owned by Telia DG (which doesn’t own any other node in this subnet). I think it’s unclear why this node is being removed…

The 2 nodes that are proposed to replace these far flung nodes are both located in central Europe (Germany and Switzerland), practically on each other’s doorstep by comparison.

I’ve rejected this proposal because the outcome seems in clear contradiction with the proposal summary. The summary could have been clearer about what it’s trying to achieve and why.

2 Likes

@timk11 has raised a similar concern here

2 Likes

Related

3 Likes

Thanks @Sat for being on top of this, and for the excellent level of communication as the recovery process has unfolded :+1:

Cycle burn and transactions have shot up, and blocks are steadily following (albeit at a slower pace). Would be interesting to know why this latter metric is taking a longer time to bounce back.

3 Likes

Okay, looking good. I think I’ll go to sleep now. :sleeping:

image

2 Likes

DFINITY will submit an NNS proposal today to reduce the notarization delay on the subnet, lhg73, similar to what has happened on other subnets in recent weeks (you can find all details in this forum thread).

3 Likes

I just submitted a proposal to replace an unhealthy node and optimize decentralization a bit:

4 Likes

Voted to adopt proposal 133081.

This proposal replaces a dead node (pfmqh, which appears as “Status: Offline” on the dashboard) and additionally replaces another node for the purpose of improving network decentralisation. As seen in the proposal (which I verified using the DRE tool), the overall effect of these additions is to improve decentralisation with respect to continents only. The status with respect to the target topology is unchanged and remains within the requirements.

3 Likes

Voted to adopt proposal 133074. The initial_notary_delay_millis is set to 300 and the subnet_id is correct.

2 Likes

Proposal 133081

Thanks @Sat! 2 removed nodes, in China (up) and Japan (offline), replaced with a nodes in South Africa (unassigned) and Australia (unassigned). This proposal also increases decentralisation in terms of average geographic distance between nodes, and continent diversity.:+1: I’ve voted to adopt.

Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) →

Smallest Distance Average Distance Largest Distance
EXISTING 317.676 km 7583.196 km 18505.029 km
PROPOSED 317.676 km 8678.125 km (+14.4%) 18505.029 km

This proposal increases decentralisation, considered purely in terms of geographic distance (and therefore there’s a slight theoretical increase in localised disaster resilience). :+1:

Subnet characteristic counts →

Continents Countries Data Centers Owners Node Providers
EXISTING 3 13 13 13 13
PROPOSED 5 (+40%) 13 13 13 13

This proposal slightly improves decentralisation in terms of continent diversity. :+1:

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) →

Continent Country Data Center Owner Node Provider
EXISTING 6 1 1 1 1
PROPOSED 4 (-33.3%) 1 1 1 1

See here for acceptable limits → Motion 132136

The above subnet information is illustrated below, followed by a node reference table:

Map Description
  • Red marker represents a removed node (transparent center for overlap visibility)
  • Green marker represents an added node
  • Blue marker represents an unchanged node
  • Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
  • Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table
Continent Country Data Center Owner Node Provider Node Status
--- Asia China HongKong 1 (hk1) Unicom Pindar Technology Limited uj5bp-c66bz-meslf-u4uts-q3njs-tgfnr-bueq6-372lq-yz7so-4fu27-3qe UP
--- Asia Japan Tokyo (ty1) Equinix Starbase pfmqh-xphm4-h4wkn-nafsx-aix6u-h4gbl-owfhy-wnm4y-vkqyu-nmlhl-aqe DOWN
+++ Oceania Australia Queensland 1 (sc1) NEXTDC ANYPOINT PTY LTD 56ovz-lrvyd-gggsl-qtenl-uuokx-p7t3t-rg6mc-6lc5l-usfqb-fygiv-aqe UNASSIGNED
+++ Africa South Africa Gauteng 2 (jb2) Africa Data Centres Honeycomb Capital (Pty) Ltd 5v4on-bsceg-rdgxe-zcqqf-l5wnq-fpxw7-x3ktj-3x4fs-o2cny-uzhor-vqe UNASSIGNED
Europe Belgium Antwerp (an1) Datacenter United Allusion mihvd-umv3j-cjsl2-bfsdu-td7aw-2y6if-aw4fn-cghkm-v2oxd-kj75q-cae UP
Americas Canada Vancouver (bc1) Cyxtera Blockchain Development Labs ddbl6-37efl-b75e4-jpfsb-zioa6-ilvzo-tldwy-fnbhm-nbuoy-66cza-uqe UP
Americas Costa Rica Bogota 1 (bg1) EdgeUno Geeta Kalwani ihttm-45oz5-an5mg-i2jtb-fayst-s47j6-vmuwr-fqotf-mp2il-n5s5x-cae UP
Europe Germany Marseille (mr1) Digital Realty DFINITY Operations SA rsp26-d2hko-kvacs-6mdca-dumka-qxiyw-4yzkp-cuwgr-lj7je-v7e6z-4qe UP
Europe Estonia Tallinn 2 (ta2) Telia DC Vladyslav Popov bptaj-nejw4-osqqa-zwrej-ysl2o-5ffgj-hkjr6-2w6fi-jczex-vjutw-iae UP
Asia Georgia Tbilisi 1 (tb1) Cloud9 George Bassadone ognrk-q4exl-3wf25-yrrsy-mtezk-e3qww-k6s5v-2pikz-gto6z-dyl2y-eae UP
Asia India Navi Mumbai 1 (nm1) Rivram Rivram Inc pdo46-iehoo-x2gfu-t5qu5-y3e64-cdymo-eioop-h6f4a-zebwa-fenb4-xae UP
Asia Korea (the Republic of) Seoul 1 (sl1) Megazone Cloud Neptune Partners ixo23-jxvux-ktqca-bje7d-py56s-yvjy5-zpxrk-fmlxt-zhuhg-wu5bc-wqe UP
Asia Singapore Singapore 3 (sg3) Racks Central OneSixtyTwo Digital Capital bz73b-igxp6-g5dbn-f3ran-p2t5o-b4q42-guoec-imv4z-mvo25-7p4sp-2ae UP
Europe Slovenia Maribor (mb1) Posita.si Fractal Labs AG jfryc-owgdd-a7pp4-lao2c-anza2-nryvi-gqkmu-m2moj-4hzai-zfdiy-4qe UP
Americas United States of America (the) Chicago 3 (ch3) CyrusOne MI Servers cilsw-jxcbi-qvp5o-7cylv-up5nj-2yykt-jtzha-s2uao-ee7uy-nprfm-vae UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Other good neurons to follow:

  • Synapse (follows the LORIMER and CodeGov known neurons for Subnet Management, and is a generally well informed known neuron to follow on numerous other topics)

  • CodeGov (actively reviews and votes on Subnet Management proposals, and is well informed on numerous other technical topics)

  • WaterNeuron (the WaterNeuron DAO frequently discuss proposals like this in order to vote responsibly based on DAO consensus)

2 Likes

Voted to adopt proposal 133081.

The proposal replaces the dead node, pfmqh with an Offline Status on the dashboard, in the lhg73 subnet.

The proposal also takes the opportunity to replace another node on the subnet in order to increase the Nakamoto coefficient of the country metric as verified with the Dre tool.

1 Like

I also voted to adopt this proposal 133081 since the Motivation and Node details including impact on decentralization matches my findings.

1 Like

Proposal 133150

Decentralisation stats are unaffected by this proposal. This proposal proposes replacing a ‘degraded’ node with another node in Singapore. The node in question is actually currently ‘up’ rather than ‘degraded’. However, it is consistently failing a very small fraction of blocks.

image

@sat, are you able to provide more information about what it actually means for a node to be considered degraded? i.e. What’s the threshold that a node needs to cross to go from being considered ‘up’ to ‘degraded’?


Decentralisation Stats

Subnet node distance stats (distance between any 2 nodes in the subnet) →

Smallest Distance Average Distance Largest Distance
EXISTING 317.676 km 8678.125 km 18505.029 km
PROPOSED 317.676 km 8678.125 km 18505.029 km

Subnet characteristic counts →

Continents Countries Data Centers Owners Node Providers
EXISTING 5 13 13 13 13
PROPOSED 5 13 13 13 13

Largest number of nodes with the same characteristic (e.g. continent, country, data center, etc.) →

Continent Country Data Center Owner Node Provider
EXISTING 4 1 1 1 1
PROPOSED 4 1 1 1 1

See here for acceptable limits → Motion 132136

The above subnet information is illustrated below, followed by a node reference table:

Map Description
  • Red marker represents a removed node (transparent center for overlap visibility)
  • Green marker represents an added node
  • Blue marker represents an unchanged node
  • Highlighted patches represent the country the above nodes sit within (red if the country is removed, green if added, otherwise grey)
  • Light grey markers with yellow borders are examples of unassigned nodes that would be viable candidates for joining the subnet according to formal decentralisation coefficients (so this proposal can be viewed in the context of alternative solutions that are not being used)

Table
Continent Country Data Center Owner Node Provider Node Status
--- Asia Singapore Singapore 3 (sg3) Racks Central OneSixtyTwo Digital Capital bz73b-igxp6-g5dbn-f3ran-p2t5o-b4q42-guoec-imv4z-mvo25-7p4sp-2ae UP
+++ Asia Singapore Singapore 2 (sg2) Telin OneSixtyTwo Digital Capital cpywp-n4j5f-ja44p-oykxm-umz7h-fk6v2-rowix-bkwc4-ly4fw-tvu6c-mae UNASSIGNED
Oceania Australia Queensland 1 (sc1) NEXTDC ANYPOINT PTY LTD 56ovz-lrvyd-gggsl-qtenl-uuokx-p7t3t-rg6mc-6lc5l-usfqb-fygiv-aqe UP
Europe Belgium Antwerp (an1) Datacenter United Allusion mihvd-umv3j-cjsl2-bfsdu-td7aw-2y6if-aw4fn-cghkm-v2oxd-kj75q-cae UP
Americas Canada Vancouver (bc1) Cyxtera Blockchain Development Labs ddbl6-37efl-b75e4-jpfsb-zioa6-ilvzo-tldwy-fnbhm-nbuoy-66cza-uqe UP
Americas Costa Rica Bogota 1 (bg1) EdgeUno Geeta Kalwani ihttm-45oz5-an5mg-i2jtb-fayst-s47j6-vmuwr-fqotf-mp2il-n5s5x-cae UP
Europe Germany Marseille (mr1) Digital Realty DFINITY Operations SA rsp26-d2hko-kvacs-6mdca-dumka-qxiyw-4yzkp-cuwgr-lj7je-v7e6z-4qe UP
Europe Estonia Tallinn 2 (ta2) Telia DC Vladyslav Popov bptaj-nejw4-osqqa-zwrej-ysl2o-5ffgj-hkjr6-2w6fi-jczex-vjutw-iae UP
Asia Georgia Tbilisi 1 (tb1) Cloud9 George Bassadone ognrk-q4exl-3wf25-yrrsy-mtezk-e3qww-k6s5v-2pikz-gto6z-dyl2y-eae UP
Asia India Navi Mumbai 1 (nm1) Rivram Rivram Inc pdo46-iehoo-x2gfu-t5qu5-y3e64-cdymo-eioop-h6f4a-zebwa-fenb4-xae UP
Asia Korea (the Republic of) Seoul 1 (sl1) Megazone Cloud Neptune Partners ixo23-jxvux-ktqca-bje7d-py56s-yvjy5-zpxrk-fmlxt-zhuhg-wu5bc-wqe UP
Europe Slovenia Maribor (mb1) Posita.si Fractal Labs AG jfryc-owgdd-a7pp4-lao2c-anza2-nryvi-gqkmu-m2moj-4hzai-zfdiy-4qe UP
Americas United States of America (the) Chicago 3 (ch3) CyrusOne MI Servers cilsw-jxcbi-qvp5o-7cylv-up5nj-2yykt-jtzha-s2uao-ee7uy-nprfm-vae UP
Africa South Africa Gauteng 2 (jb2) Africa Data Centres Honeycomb Capital (Pty) Ltd 5v4on-bsceg-rdgxe-zcqqf-l5wnq-fpxw7-x3ktj-3x4fs-o2cny-uzhor-vqe UP

Known Neurons to follow if you're too busy to keep on top of things like this

If you found this analysis helpful and would like to follow the vote of the LORIMER known neuron in the future, consider configuring LORIMER as a followee for the Subnet Management topic.

Other good neurons to follow:

  • Synapse (follows the LORIMER and CodeGov known neurons for Subnet Management, and is a generally well informed known neuron to follow on numerous other topics)

  • CodeGov (actively reviews and votes on Subnet Management proposals, and is well informed on numerous other technical topics)

  • WaterNeuron (the WaterNeuron DAO frequently discuss proposals like this in order to vote responsibly based on DAO consensus)

2 Likes

Voted to adopt Proposal 133150.

This proposal replaces a node in subnet lhg73: bz73b, which appears as “Status: Active” in the IC dashboard but shows a small but consistent rate of failed blocks in the Node Provider Rewards tool, with another node from the same node provider in a different data centre in the same city, thereby having no impact on Nakamoto coefficients or target topology parameters.

2 Likes

Adopted the proposal of replacing this node from sg3.

That is a difficult question. So there are two types of “transactions” on the IC: updates (potentially mutating canister state), and queries (strictly read only). Queries are faster and do not go through consensus. Updates are slower and go through consensus. Obviously, both are important.
The issue is that “trustworthy node metrics” do not accurately track query handling on the node. It’s a known issue, and we’re looking for an additional “trustworthy” metric we could use for this purpose. But it’s a fairly difficult problem since any such data needs to go through consensus to be trustworthy – and queries don’t.

In the meantime, we rely on our observability stack to catch some of these issues.
We have a lot of internal alerts configured in the monitoring stack for the mainnet nodes and these internal alerts are mirrored in the public dashboard. That’s why you can see some nodes as Degraded in the public dashboard - this indicates that some alerts are firing for the nodes.

Our DRE tooling will automatically take into account node health when replacing nodes and will only pick nodes without alerts as “healthy” replacements. I hope that over time we’ll have to rely less on the internal reliability and more on the data provided by the IC itself.

1 Like

Thanks for explaining @sat, this is interesting. What does the observability stack do to detect query issues (does it periodically query a sample of nodes and compare results or something)?

Out or interest, is there a reasonable explanation for why a node may be performing well for update calls but badly for queries?

The observable stack has access to raw replica metrics, so behavior when serving both updates and queries.
Although to be completely open, in this case the replica on this node was rejecting updates calls as well, with a message that ingress message has invalid expiry. You can look it up on the forum, there are many user complaints. However, this still doesn’t show up in trustworthy metrics since node was correctly creating blocks. It was just rejecting ingress.
This should obviously in the future be reflected in the node rewards to incentivise node providers to quickly resolve the issue. We’ll get there, step by step.

3 Likes

Voted to adopt proposal 133150. Replaces a degraded node without changing the Nakamoto Coefficient.