We are happy to announce that voting will soon be open for two new releases. These will be action-packed because it’s two weeks of release in a single one.
The first proposal is for the standard release – find the information for the feature release at the bottom of this post.
[88160bf34] Execution,Runtime: Charge for chunked install on hash mismatch
[879331e82] IDX,Consensus,Cross Chain,Execution,Runtime: fix existing cargo clippy errors and make sure we run cargo clippy on the whole repository only with relevant lints
[76f5ebf17] Message Routing: () read api_boundary_nodes from the registry
[ad2d9b3b9] Message Routing,Execution: () Save Api Boundary Nodes in ReplicatedState metadata
[3c9dd30e5] Networking,Message Routing,Runtime: remove the dependency on the old static_assertions crate
[ef23f44ad] Networking,NNS: remove various unused dependencies
[58aa6d6cb] Node: Updating container base images refs [2024-02-06-1029]
[f7c33787c] Node: Updating container base images refs [2024-02-01-0814]
The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image,
must be identical, and must match the SHA256 from the payload of the NNS proposal.
The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image,
must be identical, and must match the SHA256 from the payload of the NNS proposal.
It looks like this was a big change (based on the number of commits). Do you mind explaining why there wasn’t a Replica Version Management proposal last week? Is this a new normal to submit these proposals every other week or was this just an anomaly? I’m interested from a resource planning perspective for CodeGov. Thanks for any insight you can provide.
Yes, last week there was no RVM proposal, because we were fixing tests that weren’t passing. This week is extraordinary because of that – the changes from last week accumulated into this week’s release.
We’re looking into the SetupOS nonreproducibility issue. Please be patient.
Would you please help us recalibrate expectations on the IC-OS Verification. DFINITY has spent a considerable amount of time working on the reproducibility of the SetupOS and HostOS hash and most of the time it works, but historically only the GuestOS hash was expected to match between the sha256sum of the local build, the sha256sum of the downloaded build, and the sha256sum reported in the proposal payload. Even though the SetupOS and the HostOS match most of the time these days, the payload of the proposal still only references the GuestOS.
The mismatch of the SetupOS hash in the proposal this week poses an interesting dilemma. I’ve only seen one reviewer for CodeGov who posted a match for SetupOS, but 4 other reviewers so far have reported a mismatch. My inclination is that our team should vote to Reject this proposal since the IC-OS Verification script that is provided in the proposal does not result in reproducible builds for all sha256 sums that are evaluated by the script. The Summary of the proposal no longer indicates that IC-OS Verification should only include GuestOS even though that is the only sha256sum and the only release package url that is listed in the payload.
Should there be an expectation that IC-OS Verification matches for GuestOS, SetupOS, AND HostOS or do we still only need to make decisions based on GuestOS? Personally, I want to hold us to the higher standard of expecting all builds to match, but just want to verify if this position is consistent with expectations presented in the proposal.
For reference, here is a link to the thread on OpenChat where CodeGov reviewers are posting their results. They still have 24 hours to complete their reviews, so it is actively changing at this time. Some people post their IC-OS verification first and then come back to post their full review when they are done reviewing the code.
@wpb Reproducibility is a very hard problem, but also very important for us. So far we have spent a considerable effort in 1) making builds be reproducible at all, an 2) making CI infrastructure that will catch non-reproducibility. In the last few weeks I have also noticed a few CI heisenfailures in the repro checks but they would always succeed when we tried to repro locally.
Regarding guest/host/setup OS, in these proposals we care about guest. In the host version elect proposals we care about host OS. And setup OS is only used when deploying new nodes. We have no proposal for setup OS, yet, since NPs are free to pick any version to deploy a new node and there is no way to prevent them from doing so. We can consider adding a proposal for setup os as well but that that should be considered as a separate activity from the guest os upgrades.
I will reach out to the idx team to ask them to take a deeper look into the reproducibility failures because more than one person reported them now it’s not internal heisenbugs only anymore, since you have seen them in the wild as well and maybe they can get the binaries that were built by the code gov members and maybe that helps for debugging and finding the root cause of the issues.
And there’s one more question that I like to tackle here and that is how many reproductions do we need in order to claim a successful rebuild and verify that a build is indeed what to claim it to be. Why do we need reproducibility in the first place? With reproducible builds we actually verify that the code does not contain any unintended changes which means that the only thing we need is two or three independent and trustworthy rebuilds with the same sha256 sum. Do we have those? Or do you all get different results?
I’ll also ping @marko is he was most active in developing the reproducible builds and the corresponding CI checks.
Something happened, because for months (dozens of proposals), the builds were being reproducible.
It was noted by one reviewer that the order mattered. Building last version (v694), and then v692, allowed v692 to pass. (I haven’t tested this though)
@tiago89@wpb it would be great if you could try to prune podman images and then try again. In the past we had some problems with stale images being used.
The CodeGov neuron has voted to reject proposal 127692 based on consensus of our voting members who are configured as Followees. There are a variety of reasons why we voted to reject, which I will summarize later today when all the reviews are complete.
At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.
We had several very good reviews of the Release Notes on these proposals by @Zane, @cyberowl, @ZackDS, @massimoalbarello, @ilbert, @Gekctek, and @hpeebles. The IC-OS Verification was also performed by @jwiegley and @tiago89. I recommend folks take a look and see the excellent work that was performed on these reviews by the entire CodeGov team. Feel free to comment here or in the thread of each respective proposal in our community on OpenChat if you have any questions or suggestions about these reviews.
As mentioned previously, we had 3 reviewers who could not reproduce the build for the SetupOS and 4 reviewers who had complete build failures and could not verify any of the hashes that are included in the IC-OS verification script for proposal 127692. However, there were 2 reviewers who were successful. We understand that the primary objective is to verify the GuestOS, but most of our reviewers opted for a more conservative position by voting to reject 127692 due to the inconsistencies that we were observing.
It is also noteworthy that @hpeebles, @ilbert, and @massimoalbarello commented that there were many commits included in this release that we have seen in previous proposals. @massimoalbarello noticed that at least one of these prior commits was not preceded by a proposal to revert the previous changes. Hence, it raises questions and some concern about whether or not there was a mix-up in the release notes or which commits should be included in this release.
For proposal 127692, we had 2 vote Adopt and 7 vote Reject, which was sufficient to reach consensus to reject.
It was also observed that the similarities between these proposals (since 127694 just enables one additional feature relative to 127692) made several reviewers (@Zane, @cyberowl, @tiago89) uncomfortable adopting proposal 127694 even though the ic-os verification passed. Several other reviewers (@ZackDS@massimoalbarello@ilbert) adopted 127694 while also expressing concern that the right answer might be to reject. I think @Zane made a good argument in his description of how he voted when he said “as other reviewers have reported I’ve had issues consistently reproducing the hashes for SetupOS. During the first attempt I got mismatching hashes, Interestingly, after building proposal 127694, I retried and, this time, all images validated successfully. Due to this inconsistencies and considering there is no urgency in pushing this build on mainnet, I’ve voted to reject it.”
For proposal 127694, we had 6 vote Adopt and 3 vote Reject, which was not sufficient to reach consensus (since we have 12 Followees configured for this proposal topic). Hence, I have taken the action to manually vote to Reject this proposal after giving consideration to the feedback provided by our reviewers. This is the more conservative approach. In almost all cases the choice to Adopt a Replica Version Management is very easy because the CodeGov reviewers are unanimous. In this case, there were too many discrepancies for us to sign off on these proposals.
We hope that folks will take a closer look at our reviews for this proposal here (127692) and here (127694).
Since SetupOS is used as a streamlining function for hypervisor and virtual machine installation by node providers, i suppose it is not as critical to have SetupOS version mismatch.
However since you can have a many-to-many between HostOS & GuestOS and presumably newer versions of HostOS may have better security & other guarantees at the hypervisor level, i have an indirect question.
Is there impact analysis done with installing a newer GuestOS on a older HostOS? The real intent of this question is that with which HostOS is the proposed GuestOS tested?
Hello there!
After thorough examination, it has come to our attention that these proposals contain incorrect Release Notes, which could potentially lead to confusion and misalignment within our ecosystem.
In light of this, Dfinity is proposing that we reject these current proposals (127694, 127692).
Furthermore, to address the issues identified and ensure that we are moving forward with clarity and precision, we are preparing to introduce two new proposals as replacement for (127694, 127692). These proposals will exclusively include changes from latest rolled out RC (release-2024-01-25_14-09), ensuring that all modifications are accurately reflected and communicated.
We kindly ask for your support in this matter by voting to reject the current proposals (127694, 127692) with incorrect Release Notes. Your participation is invaluable as we strive to make decisions that best serve the interests of our community and the ongoing development of the ICP ecosystem.
Thank you for your attention to this important matter.
The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.
The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.
@wpb@tiago89 we double and triple checked this time and the same version is now reproducible. IDX team will follow up separately with an explanation of what happened. Spoiler: one file in GuestOS has +2 seconds timestamp for some very odd and unexplainable reason. No other differences. They are still looking into this.
The current plan is to have these follow-up proposals adopted on Wed/Thu and roll out these versions until the end of the week. And then hopefully return to the regular release cycle.