Voting for a new IC release - e268b98

Hello there!

We are happy to announce that voting is now open for a new IC release .
The NNS proposal is here: IC NNS Proposal 125503 .

Here is a summary of the changes since the last release:

  • [2e14994b5] Consensus: feat: Don’t relay shares during retransmissions
  • [222ce6539] Consensus: fix: Set correct URLs in registry when using with_mainnet_config
  • [5bd3b0dd5] Crypto: chore: Remove placeholder_to_delete from CspNiDkgTranscript production code
  • [9474fe89b] Crypto: chore: set correct name for the vault’s response serialization metric
  • [843f26d32] Crypto: feat: Add support at the crypto component layer for P-256 threshold ECDSA
  • [e89564add] Crypto: fix(cketh): Return an error when receipient’s address is blocked [override-didc-check]
  • [e0c11ccab] Crypto: perf: add I-DKG benchmarks with remote vault
  • [73c9f946e] Crypto: perf: deserialize transcripts internally and remove obsolete logging of public key
  • [4f22c3d0e] Crypto: perf: do not redundantly deserialize dealings in IDKG
  • [cce4fc2e0] Crypto: perf: efficient byte vector serialization in RPC with the crypto vault
  • [5c392939c] Crypto: perf: use remote vault in NI-DKG and threshold_sig benchmarks
  • [8ec3e3ed3] Execution: fix: add instructions-per-byte factor to data sending api calls
  • [5ea44375c] Message Routing: chore: LSMT feature flag
  • [55e5d9a85] Networking: Add new FR1 + CH1 InfraDC prefixes
  • [16f3b2c5f] Networking: chore: don’t leak the consensus time across the stack.
  • [dc09a9c33] Networking: chore: don’t relay canister http shares
  • [b48aa61d9] Networking: fix(consensus_manager): redo consensus manager metrics and add dashboard
  • [25b42af3f] Networking: fix: implement get_all_validated_by_filter for all clients
  • [ed32583ca] Node: Add qemu required dependencies to HostOS
  • [cebeff443] Node: Enable HostOS upgrades
  • [2202dc0d5] Node: Fix VSOCK timeout errors
  • [9c18b1f3b] Node: chore: hardware gen metric rename
  • [8d77465ca] Node: feat: Add chip_id to AddNodePayload during node registration
  • [8b7676e29] Runtime: Account for Call generated by Grow instructions
  • [9401b085b] Runtime: Check freezing threshold on upload_chunk
  • [770a1faa7] Runtime: Implement clear_chunk_store
  • [f5d4d47b2] Runtime: Implement stored_chunks
  • [49c1acbd0] Runtime: Limit size of Wasm chunk store
  • [1011aa44d] Runtime: Reserve cycles in upload_chunk
  • [4afcb7050] Runtime: Set epoch from query handler
  • [f3216c1d7] Runtime: Switch CanisterId::new to infallable CanisterId::unchecked_from_principal
  • [c7ed3f9d6] Runtime: Upgrade serde
  • [3d44d7ed0] Runtime: Upgrade wasmtime to version 13.0.1
  • [74651414a] Runtime: Wrapper around Prometheus and fix Farm deployment scripts
  • [6b1ab9104] Runtime: chore: Additional system api simplifications
  • Various tech-debt management: code refactoring, docs, bug fixes, test updates

Link to the forum post: Voting for a new IC release - e268b98

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/e268b9807f1ab4ae65d7b29fe70a3b358d014d6a/gitlab-ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c e268b9807f1ab4ae65d7b29fe70a3b358d014d6a

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

4 Likes

This week we have a special (new) proposal type, to upgrade HostOS as well.

The upgrade procedure is essentially the same as for the GuestOS (the regular upgrades). The difference is only that the HostOS upgrades won’t be performed per subnet, but per list of nodes. The lists of nodes to be upgraded to this new version will be provided in separate NNS proposals, and will also need to be voted in by the community.

The new(er) HostOS versions have updated system packages and are therefore recommended for security reasons.

3 Likes

Reviewers for the CodeGov project have completed our review of these replica updates.

Proposal ID: 125504
Vote: ADOPT
Full report: CodeGov portal on DSCVR

Proposal ID: 125503
Vote: ADOPT
Full report: CodeGov portal on DSCVR

Neuron: CodeGov
NeuronID: 2649066124191664356
Voting history: Dashboard
Website: codegov.org

At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.

We had several very good reviews of the Release Notes on these proposals by @Zane, @cyberowl, @ZackDS, @massimoalbarello, and @ilbert. The IC-OS Verification was also performed by @Gekctek and @tiago89. I recommend folks talk a look and see the excellent work that was performed on these reviews by the entire CodeGov team. Feel free to comment here if you have any questions or suggestions.

1 Like

Hey @pietrodimarco please see below the comments from the CodeGov team that may be of interest to each of the change owners for these updates. Nothing was identified of significant concern, but there were a few comments worth knowing.

[222ce6539]
Comment by @ilbert
What was happening before this fix? Seems like a big change and that the tests before where all not correct.

[e0c11ccab]
Comment by @ilbert
There may be a typo here: wowe .

[73c9f946e]
Comment by @ilbert
ecdsa_sign_share doesn’t need #[allow(clippy::too_many_arguments)] anymore here. Must be removed.

[9401b085b]
Comment by @massimoalbarello
not clear to me why in upload_chunk when allocating memory in best effort they compute the freeze threshold based on the new memory usage and then compare it to the cycle balance. I was expecting the freezing threshold to be fixed and the check to be performed as something like balance - new_memory_usage < threshold .

Comment by @ilbert
Introduces also can_insert_chunk function, that checks if the chunk doesn’t exceed the maximum allowed size and is called in the insert_chunk function.
Commit description is missing this.

[770a1faa7]
Comment by @cyberowl
Adds method clear_chunk_store
So it says in the commit The only conditions on this method are that it has to be called by a controller of the canister.
That is true in canister_manager.rs but not execution_environment.rs. I hope that is intentional.

[1011aa44d]
Comment by @massimoalbarello
when uploading a chunk, check if cycles will need to be reserved and take it into account when checking if the freezing threshold is reached. Here i have the same doubt as in [9401b085b].

[4afcb7050]
Comment by @ilbert
Typo here and here: missmatchmismatch .

[f3216c1d7]
Comment by @ilbert
Is try_from at here still required now that we don’t return a result anymore?

1 Like

@pietrodimarco @sat @Luka

I believe these are questions asked by @ilbert last week from within the CodeGov chat group that may not have been brought up on the forum. They were originally related to proposal 125320, which was IC release c2e5826. I thought I’d post it here in case we can get an answer from DFINITY.

Do you know where do these complexity parameters come from? Who decides them? Do they impact how computation costs are calculated for canisters? Link

Is there a place where we can find the calculations and/or motivation of these values? Link

1 Like

One other set of questions that was asked in the CodeGov chat group this weekend by @ZackDS that I thought I’d post here to get feedback from DFINITY…

Why is the Update Elected HostOS Versions under the Node Admin topic? Why not under Replica revisions? Now it seems it will get more complicated every time a list of nodes has to be voted on.

Also, the payload contains the host-os/update-img/update-img.tar.zst file, but nothing about how to check it or what “updated system packages” it contains. What is the recommendation for how to review these changes?

Thanks @wpb and @ilbert for asking! Big fan of CodeGov.

Adressing commit [222ce6539]:

At the time of this commit, no existing test that uses the with_mainnet_config option was relying on the correct update image URL in the registry, so no test was affected by this bug. It only came to my attention because I started making changes to one of our orchestrator system tests upgrade_downgrade_{app,nns}_subnet that would now rely on correct registry data. So 222ce6539 was a preparation for the orchestrator test change.

Said test change was merged afterwards in [afee4a82] (which had to be reverted, and then fixed in commit [de9f7c123]). The first commit message describes the reasoning for the test change. Tl:dr: the orchestrator now starts its upgrade cycle from the mainnet instead of the branch version, matching the tests we perform during qualification of a release. This discrepancy has previously caused bugs to slip past CI, only to be found later during release qualification.

Please let me know if there’s anything unclear. I’m currently on vacation, so I might have a delayed response time.

3 Likes

Thanks @ilbert for the useful suggestions regarding [e0c11ccab] and [73c9f946e]! I implemented them and they will be published in one of the next releases.

3 Likes

Thanks for the review @ilbert, @massimoalbarello, and @cyberowl!

When setting the freezing threshold, users set the threshold as a time duration t with the expectation that if the canister gets frozen, they will have at least time t until it gets uninstalled. The number of cycles needed to keep the canister installed for time t will vary depending on the canister’s memory usage because a higher memory usage means the canister burns more cycles per second.

That’s why we need to compute the threshold based on the new memory usage. Does that make sense?

I don’t think there’s anything we can do about that now. Besides, can_insert_chunk is an internal implementation detail, not a public API. So it’s probably fine that it’s not mentioned in the commit message.

Yes this is intentional - the message is actually executed in canister_manager.rs so it’s fine to check the controller condition there. This is also the same as what we do for all the other management canister APIs (install_code, stop_canister, etc.).

In this case cycles and reserved cycles are fungible when checking against the freezing threshold. So the calculation ends up the same either way.

Good point. Yes, we can probably remove it now.

2 Likes

@wpb I have an idea about how to make this feedback sharing more streamlined. Here’s what we can do instead of you writing these summaries on the forum and then somebody from DFINITY needs to figure out the affected teams and ping them internally, and then thew teams need to scan through the relevant commits, etc, etc…

Any GodeGov team member can share the feedback directly on the commit on https://github.com/dfinity/ic/commit/<COMMIT_ID>. The only prerequisite is to ping the commit author explicitly using the “@” token in the comment itself like this: Merge branch 'kpop/remove_new_from_cup_without_bytes' into 'master' · dfinity/ic@fd5d498 · GitHub

Then the commit author will get an email and address the concern on GitHub.

WDYT?

2 Likes

I can help speak to a few of these questions!

HostOS upgrades are separate from the replica. The replica is where the core-IC protocol resides—all the consensus, networking, and canister logic. HostOS is the operating system of the underlining hypervisor. HostOS is limited in its logic by design. It’s primary job is to launch the replica VM. So in this way, Node Admin is an appropriate name.

In the future, hostOS version elections should definitely include the IC-OS Verification section, the same way replica version election proposals have them. This was an oversight, and the next HostOS proposal will include it.

Regarding how to review these changes, this is a little trickier because you’ve actually been doing HostOS reviews the whole time you’ve been doing replica reviews, as replica release notes have included changes to the replica and to the HostOS. We are discussing ways of improving the HostOS review process, and are open to feedback regarding it.

Yes, this seems like a good suggestion. I’ll ask the CodeGov team to take this approach for a few weeks to see if it works for everyone. It’s probably reasonable for our reviewers to link the GitHub conversation in their report.

One thing I like about our current format is that posting on the forum offers visibility to the fact that we are performing reviews and that DFINITY has been responding. It feels like a good collaborative effort. I can see how the current format is more work than necessary though and am happy to consider alternate suggestions.

Test Post of a GitHub Comment with “@” Token

2 Likes