The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.
I see the Proposal: 130082 - ICP Dashboard has the same revision (ec35ebd2) as this one, except that it lists a bigger amount of commits and has a different build hash.
Building dependency tree…
Reading state information…
jq is already the newest version (1.6-2.1ubuntu3).
curl is already the newest version (7.81.0-1ubuntu1.16).
git is already the newest version (1:2.34.1-1ubuntu1.10).
podman is already the newest version (3.4.4+ds1-1ubuntu1.22.04.2).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
2024/05/24 | 09:53:46 | 1716544426 [-] Field .payload.replica_version_to_elect does not exist in proposal-body.json
And the printscreen when I manually tried (a few minutes ago):
(I am running a repro check that is already a few months old)
I noted that this proposal retires 12 IC-OS versions all in one go. I got curious about the details and scripted some analysis, which raised a couple questions for me. If you’re able to answer either of these when you get a chance that would be much appreciated.
Does Dfinity have a well defined policy for when it is and isn’t appropriate to retire a specific IC-OS version? i.e. a means of minimising risks, for example maintaining a sufficient number of versions that can be rolled back to in the event of a latent IC-OS bug.
Can I ask why Dfinity moved away from the ‘RetireReplicaVersion’/‘BlessReplicaVersion’ NNS functions in late 2023 (in favour of using a single ‘ReviseElectedGuestosVersions’ function)? I see the motion where RetireReplicaVersion was proposed, but not where it was replaced (presumably because it’s still available, just not being used). If a member of the community wants to object to the unelection of some IC-OS versions (due to rollback risk tolerance), but they have no issue with the election aspect of the proposal, it seems they have no option but the reject the whole proposal (which previously wasn’t an issue when the relevant NNS function actioned by the proposal was decoupled allowing separate proposals).
The current proposal seems like a potentially good example for the points above.
Based on IC-OS election proposal history, there currently appear to be 17 blessed replica versions stored in the registry canister (so that they can be readily deployed), 12 of which would be removed by this proposal. I’ve listed these below, ordered by elected date, and crossed out the versions that would be unelected/removed.
19dbb5c, elected 2024-04-15 (proposal 129081), UNELECTION PROPOSED, running on 0 subnets
02dcaf3, elected 2024-04-15 (proposal 129084), UNELECTION PROPOSED, running on 0 subnets
abcea3e, elected 2024-04-22 (proposal 129378), UNELECTION PROPOSED, running on 0 subnets
0a51fd7, elected 2024-04-22 (proposal 129379), UNELECTION PROPOSED, running on 0 subnets
33dd2ef, elected 2024-04-22 (proposal 129408), UNELECTION PROPOSED, running on 0 subnets
4e9b02f, elected 2024-04-23 (proposal 129423), UNELECTION PROPOSED, running on 0 subnets
687de34, elected 2024-04-24 (proposal 129427), UNELECTION PROPOSED, running on 0 subnets
63acf4f, elected 2024-04-24 (proposal 129428), UNELECTION PROPOSED, running on 0 subnets
80e0363, elected 2024-04-29 (proposal 129493), UNELECTION PROPOSED, running on 0 subnets
5e285dc, elected 2024-04-29 (proposal 129494), UNELECTION PROPOSED, running on 0 subnets
bb76748, elected 2024-05-06 (proposal 129627), UNELECTION PROPOSED, running on 0 subnets
f58424c, elected 2024-05-06 (proposal 129628), UNELECTION PROPOSED, running on 0 subnets
2c4566b, elected 2024-05-13 (proposal 129696), running on 1 subnets
9866a6f, elected 2024-05-13 (proposal 129697), running on 0 subnets
30bf45e, elected 2024-05-16 (proposal 129706), running on 0 subnets
5ba1412, elected 2024-05-20 (proposal 129746), running on 3 subnets
b6b2ef4, elected 2024-05-20 (proposal 129747), running on 33 subnets
Relevant Subnet Version History
I’ve focused on the subnet IC-OS version history of a few of the most important subnets below. The current version is in bold, on the left of which are prior deployed versions (crossed out if due to be unelected), and on the right of which are versions that have not yet been deployed to that subnet and are not due to be unelected.
tdb26 (system), has been running 2c4566b since 2024-05-21 (4 days):
In case there’s an unexpected need to rollback to the prior deployed version, it seems sensible to always leave at least one prior deployed version for each subnet remaining in the registry (otherwise the only option would be to roll foward, or await a new IC-OS release if necessary, which seems suboptimal or possibly dangerous due to needing to wait).
All but 1 of the subnets above will still be able to rollback to a prior version after this proposal is executed, but as far as I can tell tdb26 won’t be able to, it would only be able to roll forward to a version that has not yet been deployed to that subnet (or await a patch election). It’s been running that version for 4 days now, so I’m obviously talking about something that’s unlikely. But I’m still interested in whether there’s some sort of risk aversion policy being adhered to when it comes to unelecting specific IC-OS versions, and what a community member should be looking out for when asserting that such a policy is being adhered to.
Any context you’re able to provide would very helpful. Thanks in advance!
I’ve noted that HostOS version elections used to take place under the topic of Node Admin, but have now been switched to IC OS Version Election for clarity. However the history of Node Admin HostOS elections and deployments seems a little confusing. Am I right in observing that the last HostOS version to be elected (ec140b7) has never been deployed to a node, and instead they’re all still running the prior elected version (e268b98)? I’ve gathered this from reviewing Node Admin proposal history.
If I’ve understood this correctly, can I ask why ec140b7 was never deployed to any nodes? Presumably there’s no need to worry about compatibility issues with nodes upgrading from e268b98 straight to ec35ebd2 (skipping ec140b7)?
Reviewers for the CodeGov project have completed our review of these replica updates.
Proposal ID: 130082
Vote: ADOPT Full report on OpenChat
Note: 2 CodeGov reviewers elected to Reject this proposal due to hash mismatches in the IC-OS Verification build. The other reviewers elected to Adopt. There was a lot of discussion in this proposal review thread linked above as well as posted here and here on the forum. There wasn’t a direct post on the forum for this proposal, so our questions seem to be scattered in different locations.
At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.
We had several very good reviews of the Release Notes on these proposals by @Zane, @cyberowl, @ZackDS, @massimoalbarello, @ilbert, @hpeebles, and @Lorimer. The IC-OS Verification was also performed by @jwiegley, @tiago89, and @Gekctek. I recommend folks take a look and see the excellent work that was performed on these reviews by the entire CodeGov team. Feel free to comment here or in the thread of each respective proposal in our community on OpenChat if you have any questions or suggestions about these reviews.
Absolutely! The HostOS rollouts are not fully automated (yet) so the forum posts do not get automatically created. I guess it should be fine to reuse the same forum thread to discuss both.
You are right @Lorimer – impressive observation and analysis skills! The second HostOS version was caused by me heavily underestimating the time that will be needed to roll out a single HostOS version. So I submitted both election proposals hoping that the first one will be rolled out quickly, but not really calculating or checking of the following can be technically done. But then we needed to roll out very slowly (it was the first HostOS rollout ever), and the following could not be easily set up so the community needed to vote on each step, and in result we could have at most 2 steps per week. So then after a month of a rollout, the second version didn’t make much sense to roll out.
We have now reorganized the topics (thanks to the amazing @aterga who did all the work!) and the following for the rollout is already set up, so the HostOS rollout should be much faster and easier both for the community (== less voting) and for us.