Subnet Management - uzr34 (II)

That is a lot of questions @Lorimer. Let me try to answer them.

This is indeed what we are doing. We are carefully evaluating various aspects though. Every decision has pros and cons and sometimes balancing pros and cons isn’t straightforward. Blindly following all previous decisions is not the necessarily the best action in every situation IMHO. But we do give our best to follow adopted motion proposals, and to make course corrections when needed.

We have done such things in the past, and that’s perfectly possible. However, from my experience it may happen that such nodes again move out of the subnet, e.g. when they become unhealthy, and then get added to some other subnets later on. So it’s not a silver bullet, especially when there is no support in tooling for this. So rather than doing ad-hoc changes like this, I’d prefer if we make a change in the tooling.
To make this discussion more productive – any suggestions how to do this?

Simple. We spend a ton of time answering questions in this forum instead of doing actual work. We have a very small team, and we already do a ton of things. Adding a link to the forum post is a fair amount of work and cannot be done quickly because: a) we use dre tool written in rust for submitting these proposals, and I haven’t found yet a crate to talk to discourse API from rust, b) adding a link to the forum post, e.g. in the version elect proposals, is a three-step process (1) create a stub forum thread, (2) submit a proposal with a link to the stub forum thread, and (3) update the forum thread/post contents. Obviously can be done but it’s work.
I’m now thinking if we could call python from rust (since there is a python client for discourse) or should we implement this client in rust.

In principle yes, although aggressively insisting that others take a particular action without considering their response and without providing sufficient evidence that the particular action must be taken urgently could be seen in other light :wink: To be perfectly honest I’d appreciate a bit more considerate (less “catastrophic” view). But you did have great points, and made an outstanding analysis. So thank you!
As a result of that, we did find an important discrepancy between the dre tooling and the target topology, so I see that as a fantastic outcome. As a result, we will either update the dre tooling or submit another motion proposal, depending on the outcome of additional analysis that we will have to conduct to determine the potential impact of this change. To set expectations right, it will likely take a few weeks before we finish this and agree which changes need to be made. In the meantime we’ll likely continue using the tooling as is.

SEV-SNP is not being used right now on the regular IC nodes, due to the subnet recovery challenges. This has been de-prioritized due to the estimated effort needed and unclear benefit (would everyone suddenly jump onto the IC if there was SEV-SNP? – it’s unclear). SEV-SNP will be used on Boundary Nodes, but I’m not sure what’s the timeline for this. The information on whether node supports SEV-SNP will be in the registry but AFAIK this field is not used yet.

In principle yes. We have all the necessary pieces. In practice, it’s work. One would have to terminate all activities of the node in the subnet (wait for the next CUP, etc), then after the node is not used in the subnet anymore, prune all data on the node. Then add all data from the new subnet into the node (state sync, takes between a few minutes/hours and a few days, depending on the subnet size and the link speed), and then finally the node becomes an active and productive member.
So yes it can be done in theory. In practice it’s something that needs development work. Would you like to help with this, to get your hands dirty?

4 Likes