Can I ask how DFINITY go about deciding when to unelect replica versions and which versions to unelect? I noticed that this proposal unelects 5 replica versions, none of which are currently running on any subnets (), but at least one subnet will have no earlier version that it can rollback to (if it encountered a latent bug/incompatibility issue in the version it’s currently running) once this proposal is executed.
Please expand for details
Based on IC-OS election proposal history, there currently appear to be 7 blessed replica versions registered, 5 of which would be unelected by this proposal. I’ve listed these below, ordered by elected date, and crossed out the versions that would be unelected.
9866a6f, elected 2024-05-13 (proposal 129697), UNELECTION PROPOSED, running on 0 subnets2c4566b, elected 2024-05-13 (proposal 129696), UNELECTION PROPOSED, running on 0 subnets30bf45e, elected 2024-05-16 (proposal 129706), UNELECTION PROPOSED, running on 0 subnetsb6b2ef4, elected 2024-05-20 (proposal 129747), UNELECTION PROPOSED, running on 0 subnets5ba1412, elected 2024-05-20 (proposal 129746), UNELECTION PROPOSED, running on 0 subnetsec35ebd
, elected 2024-05-27 (proposal 130083), running on 1 subnetsb9a0f18
, elected 2024-06-03 (proposal 130134), running on 36 subnets
Relevant Subnet Version History
I’ve focused on the subnet IC-OS version history of a few of the most important subnets below. The current replica version is in bold, on the left of which are prior deployed versions (crossed out if due to be unelected), and on the right of which
are versions that have not yet been deployed to that subnet and are not due to be unelected.
- tdb26 (system), has been running ec35ebd since 2024-06-03 (4 days):
2c4566b,5ba1412,ec35ebd,b9a0f18
- uzr34 (system), has been running b9a0f18 since 2024-06-04 (3 days):
2c4566b,30bf45e,5ba1412,ec35ebd,b9a0f18
- w4rem (system), has been running b9a0f18 since 2024-06-06 (2 days):
9866a6f,b6b2ef4,ec35ebd,b9a0f18
- x33ed (application), has been running b9a0f18 since 2024-06-06 (1 days):
2c4566b,5ba1412,ec35ebd,b9a0f18
- pzp6e (fiduciary), has been running b9a0f18 since 2024-06-06 (1 days):
30bf45e,5ba1412,ec35ebd,b9a0f18
In case there’s an unexpected need to rollback to the prior deployed version, it seems sensible to always leave at least one prior deployed version for each subnet remaining in the registry - which isn’t the case for the tdb26 system subnet (the only option would be to roll forward, or await a new IC-OS release if necessary, which seems suboptimal or potentially dangerous).
While this is an unlikely occurrence, I don’t think it’s inconceivable, particularly given that subnets can have different configuration and some may encounter an incompatibility with the IC-OS version (while other subnets don’t). I’m not saying that I expect there to be any issues in this particular case, but I’m explaining why I’m interested in finding out if there is a policy that’s being adhered to when deciding which replica versions to keep in the registry (and what that policy is).
I asked this same sort of question a few weeks ago.
I’m also wondering why unelections are batched. Why not maintain a sliding window of some constant size (number of versions), and always unelect the oldest replica with every new replica election?
Can I also confirm that when ec35ebd
gets unelected in a future proposal, we don’t need to worry about this retiring the host OS version that’s also identifiable by this same version hash?
As a side note, it looks like the dashboard is failing to load proposals - e.g. 130315 - if that link doesn’t fail, try refreshing the page - I’m frequently getting An error occurred while loading the proposal.
Thanks in advance