The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.
This week we have a special (new) proposal type, to upgrade HostOS as well.
The upgrade procedure is essentially the same as for the GuestOS (the regular upgrades). The difference is only that the HostOS upgrades won’t be performed per subnet, but per list of nodes. The lists of nodes to be upgraded to this new version will be provided in separate NNS proposals, and will also need to be voted in by the community.
The new(er) HostOS versions have updated system packages and are therefore recommended for security reasons.
At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.
We had several very good reviews of the Release Notes on these proposals by @Zane, @cyberowl, @ZackDS, @massimoalbarello, and @ilbert. The IC-OS Verification was also performed by @Gekctek and @tiago89. I recommend folks talk a look and see the excellent work that was performed on these reviews by the entire CodeGov team. Feel free to comment here if you have any questions or suggestions.
Hey @pietrodimarco please see below the comments from the CodeGov team that may be of interest to each of the change owners for these updates. Nothing was identified of significant concern, but there were a few comments worth knowing.
[222ce6539]
Comment by @ilbert
What was happening before this fix? Seems like a big change and that the tests before where all not correct.
[73c9f946e]
Comment by @ilbert ecdsa_sign_share doesn’t need #[allow(clippy::too_many_arguments)] anymore here. Must be removed.
[9401b085b]
Comment by @massimoalbarello
not clear to me why in upload_chunk when allocating memory in best effort they compute the freeze threshold based on the new memory usage and then compare it to the cycle balance. I was expecting the freezing threshold to be fixed and the check to be performed as something like balance - new_memory_usage < threshold .
Comment by @ilbert
Introduces also can_insert_chunk function, that checks if the chunk doesn’t exceed the maximum allowed size and is called in the insert_chunk function.
Commit description is missing this.
[770a1faa7]
Comment by @cyberowl
Adds method clear_chunk_store
So it says in the commit The only conditions on this method are that it has to be called by a controller of the canister.
That is true in canister_manager.rs but not execution_environment.rs. I hope that is intentional.
[1011aa44d]
Comment by @massimoalbarello
when uploading a chunk, check if cycles will need to be reserved and take it into account when checking if the freezing threshold is reached. Here i have the same doubt as in [9401b085b].
I believe these are questions asked by @ilbert last week from within the CodeGov chat group that may not have been brought up on the forum. They were originally related to proposal 125320, which was IC release c2e5826. I thought I’d post it here in case we can get an answer from DFINITY.
Do you know where do these complexity parameters come from? Who decides them? Do they impact how computation costs are calculated for canisters? Link
Is there a place where we can find the calculations and/or motivation of these values? Link
One other set of questions that was asked in the CodeGov chat group this weekend by @ZackDS that I thought I’d post here to get feedback from DFINITY…
Why is the Update Elected HostOS Versions under the Node Admin topic? Why not under Replica revisions? Now it seems it will get more complicated every time a list of nodes has to be voted on.
Also, the payload contains the host-os/update-img/update-img.tar.zst file, but nothing about how to check it or what “updated system packages” it contains. What is the recommendation for how to review these changes?
At the time of this commit, no existing test that uses the with_mainnet_config option was relying on the correct update image URL in the registry, so no test was affected by this bug. It only came to my attention because I started making changes to one of our orchestrator system tests upgrade_downgrade_{app,nns}_subnet that would now rely on correct registry data. So 222ce6539 was a preparation for the orchestrator test change.
Said test change was merged afterwards in [afee4a82] (which had to be reverted, and then fixed in commit [de9f7c123]). The first commit message describes the reasoning for the test change. Tl:dr: the orchestrator now starts its upgrade cycle from the mainnet instead of the branch version, matching the tests we perform during qualification of a release. This discrepancy has previously caused bugs to slip past CI, only to be found later during release qualification.
Please let me know if there’s anything unclear. I’m currently on vacation, so I might have a delayed response time.
Thanks @ilbert for the useful suggestions regarding [e0c11ccab] and [73c9f946e]! I implemented them and they will be published in one of the next releases.
When setting the freezing threshold, users set the threshold as a time duration t with the expectation that if the canister gets frozen, they will have at least time t until it gets uninstalled. The number of cycles needed to keep the canister installed for time t will vary depending on the canister’s memory usage because a higher memory usage means the canister burns more cycles per second.
That’s why we need to compute the threshold based on the new memory usage. Does that make sense?
I don’t think there’s anything we can do about that now. Besides, can_insert_chunk is an internal implementation detail, not a public API. So it’s probably fine that it’s not mentioned in the commit message.
Yes this is intentional - the message is actually executed in canister_manager.rs so it’s fine to check the controller condition there. This is also the same as what we do for all the other management canister APIs (install_code, stop_canister, etc.).
In this case cycles and reserved cycles are fungible when checking against the freezing threshold. So the calculation ends up the same either way.
@wpb I have an idea about how to make this feedback sharing more streamlined. Here’s what we can do instead of you writing these summaries on the forum and then somebody from DFINITY needs to figure out the affected teams and ping them internally, and then thew teams need to scan through the relevant commits, etc, etc…
HostOS upgrades are separate from the replica. The replica is where the core-IC protocol resides—all the consensus, networking, and canister logic. HostOS is the operating system of the underlining hypervisor. HostOS is limited in its logic by design. It’s primary job is to launch the replica VM. So in this way, Node Admin is an appropriate name.
In the future, hostOS version elections should definitely include the IC-OS Verification section, the same way replica version election proposals have them. This was an oversight, and the next HostOS proposal will include it.
Regarding how to review these changes, this is a little trickier because you’ve actually been doing HostOS reviews the whole time you’ve been doing replica reviews, as replica release notes have included changes to the replica and to the HostOS. We are discussing ways of improving the HostOS review process, and are open to feedback regarding it.
Yes, this seems like a good suggestion. I’ll ask the CodeGov team to take this approach for a few weeks to see if it works for everyone. It’s probably reasonable for our reviewers to link the GitHub conversation in their report.
One thing I like about our current format is that posting on the forum offers visibility to the fact that we are performing reviews and that DFINITY has been responding. It feels like a good collaborative effort. I can see how the current format is more work than necessary though and am happy to consider alternate suggestions.