This topic is intended to capture Subnet Management activities over time for the fuqsr subnet, providing a place to ask questions and make observations about the management of this subnet.
At the time of creating this topic the current subnet configuration is as follows:
The removed node is replaced with another node that’s also based in Germany. I’ve verified that this node is currently unassigned.
The proposal mentions that his change is needed due to the Munich (mu1) data centre being due for decommissioning. I have some questions about this @DRE-Team (please let me know if I should address this questions elsewhere in the future).
This data centre appears to provide nodes for 21 subnets, but there’s currently only a proposal for 9 subnets (one each). Are the other proposals due to follow? Can I ask why they’re not all submitted together?
Is there publicly accessible information anywhere that announces the planned decommissioning of this data centre?
Hi @Lorimer . Not sure if you saw, but there’s an answer to a related question about this set of proposals in this post. I’m not sure who is best to tag for questions of this type.
Thanks for providing further context @andrea. I have some questions about CUP proposals:
How should a community member go about verifying that the proposed CUP is valid? This is particularly relevant for manually created CUPs during subnet recovery.
Why does key resharing require a catchup package? Is this simply to ensure that in a worst case scenario the subnet does not recover to an earlier state prior to acquring the key (thereby loosing the key)?
Subnets create a CUP every epoch without halting and restarting. Why does the subnet need to be halted and restarted when resharing keys?
Thanks @andrea, I’m looking forward to these sorts of proposals being announced on these threads in the future if that’s feasible
Yes, that’s indeed tricky right now. I believe people are looking into improving the entire recovery process, but I am not the most up to date person on this. Let me ping some other team member.
Why does key resharing require a catchup package? Is this simply to ensure that in a worst case scenario the subnet does not recover to an earlier state prior to acquiring the key (thereby loosing the key)?
Currently the only mechanism in place to deliver threshold keys to a subnet is via the registry in a catch-up package. This is the mechanism used during subnet creation and recovery, where the NIDKG keys are delivered to the subnet while it is not operating, as these are needed by the consensus protocol. The same mechanism is reused by ECDSA/Schnorr. The main reason for reusing this is that it is convenient and it does not introduce extra complexity. E.g. CUPs reference an height, which makes it easy for the nodes of a subnet to determine if the CUP in the registry is more recent than the local one and decide which one they should be using. If the keys are included, e.g., as separate records in the registry, the nodes would need to monitor this record for changes across multiple registry versions to decide whether they should use it or not.
In principle you could deliver the keys to a running subnet in other ways, e.g. using XNet communication. This is appealing but it adds some difficulties:
Key resharing may fail: the source subnet may initiate the key resharing (e.g. as a result of a proposal or a subnet record update), but only deliver the key at a later time. In the meantime the subnet may have changed topology, and the delivered key may be unusable on the target subnet. These failures would need to be handled in some way.
The registry should reflect whether a key resharing failed, e.g. by not including the key ID in the subnet record. If the registry is used to deliver the keys, it is possible to perform certain checks before updating the subnet record. If the keys are delivered in other ways, it becomes more difficult to reflect the result in the registry, or it may cause the registry to have long running open call contexts with multiple subnets, which may not be desirable.
Subnets create a CUP every epoch without halting and restarting. Why does the subnet need to be halted and restarted when resharing keys?
In normal operation the subnet has all the information to create and agree on a CUP. In this case the key is on a different subnet, and it needs to be delivered to the backup subnet before this can be included in a normal CUP. The subnet has to be stalled, because the recovery CUP includes the last reported state hash from the subnet and the new key, but most importantly has a larger height than the last CUP of the subnet. If the recovery CUP was not executed with a stalled subnet, then either some subnet state will be lost, or the recovery CUP will be ignored by the subnet if it had already moved past that height.
Anyway, having the possibility of resharing to a running subnet sounds definitely like a good idea. So far these proposals have been very rare, so it did not seem very critical to support this given the extra complexity. As more keys are added to the IC and more dapps depend on them, this may change in the future.
Thanks for the detailed explanation @andrea! This is very helpful and much appreciated.
If there’s practically no means for the community to inspect and verify a catch up package, it seems the only prudent way to handle these proposals is to abstain or reject them (not with the intention of blocking the proposal, but to highlight the inability to cast a confident vote that doesn’t require trust). Does this sound like a reasonable way of looking at this?
One potential path to make progress on this front would be along the lines of
let replicas somehow expose the checkpoint hashes that they have
in case of an incident that requires recovery, introduce a proposal that lets replicas on a subnet create a special checkpoint at a certain height and stop there
With that, in case of some stall or crash loop or so, we could submit a proposal that lets replicas take a checkpoint at the latest computable state, which users can then see because the checkpoint hashes are exposed, and then finally a proposal can set a new recovery CUP with that state hash.
This sounds great I think until something like this is implemented I’ll plan to reject these sorts of proposals (just to keep some visibility on the need for this feature).
As a separate but related question, it’s often critical for these proposals to be accepted promptly to minimise disruption - in a theoretical future where governance power is significantly more decentralised than it is now, what mechanisms would be employed to encourage these sorts of proposals to be reviewed promptly by the community (or is this too far away to be a concern right now)?
Yeah I thought about it a bit. With the following mechanism and plans to incentivize being followed, I expect that in the near future, a handful of voting neurons have significant voting power and be required to reach >50% VP. Those neurons would have a lot of responsibility, and I think one part of that responsibility would be that they can somehow be contacted in case of emergency and help with quick verification and voting on urgent proposals (which should be extremely rare).
With periodic confirmation, hopefully neuron holders once in a while carefully think about who they follow and whether the neuron they follow is doing a good job handling that responsibility or not. If the neuron I follow does not actually help vote quickly in case of emergency, I would consider delegating my voting power to someone else.
This proposal sets the notarisation delay of the subnet to 300ms, down from 600ms. The change will increase the block rate of the subnet, aimed to reduce latency of update calls.
This looks like a low risk config update for this subnet. fuqsr is an application subnet with 0 canisters, and it’s also the backup test key subnet. I do have a couple of questions though.
is processing 0.00 transactions per second, yet blocks are being produced (1.28 per second). What do these blocks contain if not transactions?
is burning cycles periodically, but there are no canisters running. Can I ask what the cause of these burnt cycles and transactionless blocks is?
The cycles are being burnt through threshold ECDSA and Schnorr signatures. For every tECDSA/tSchnorr key, there is typically only one subnet that creates signatures with that key, and signing requests are routed to that subnet. fuqsr is the signing subnet for the test ECDSA and schnorr keys. What happens is that certain canisters on other subnets request signatures, which are routed to fuqsr where the signature is created, leading to the cycles being burnt there.