Proposal to elect new release rc--2024-05-29_23-02

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 130134.

Here is a summary of the changes since the last release:

Release Notes for release-2024-05-29_23-02-base (b9a0f18dd5d6019e3241f205de797bca0d9cc3f8)

Changelog since git revision ec35ebd252d4ffb151d2cfceba3a86c4fb87c6d6

Features:

  • 335fc27c7 Consensus(schnorr): Make AlgorithmId configurable during key generation
  • fef100263 Consensus(schnorr): Implement tSchnorr pre-signature state machine
  • 819a7ef93 Execution,Message Routing: Implement CanisterQueue
  • c38afe52f Node: Consolidate rootfs utils #10

Bugfixes:

  • 7b6d4fa70 Consensus(replica-firewall): Bump MAX_SIMULTANEOUS_CONNECTIONS_PER_IP_ADDRESS from 100 to 1000
  • eef5ea2e8 Node: start adapters after network service

Performance improvements:

  • 51f4c2bc9 Message Routing,Runtime: Mark files readonly after writing

Chores:

  • 0268d0dfd Consensus(schnorr): Rename QuadrupleId to PreSigId
  • 2317d527d Consensus(schnorr): Make MasterPublicKey in EcdsaReshareRequest mandatory again
  • fec35b9d1 Consensus: Rename payload validation error variants from Permanent and Transient to InvalidArtifact and ValidationFailure
  • f45570024 Crypto: use the rand version from the Cargo workspace
  • 877bbb187 Execution,Message Routing: Add idkg_dealings_contexts to SubnetCallContextManager for generalizing ComputeInitialEcdsaDealings method
  • b09cb57a7 Execution,Runtime: Add ic00 methods SchnorrPublicKey & SignWithSchnorr with stubs
  • fe9cbd239 Execution,Runtime: Remove anonymous query service
  • 97fe1543d Execution,Runtime: Wrap ECDSA subnet public keys into iDKG keys and pass them to execution layer
  • acf9f8c4a Message Routing,Execution: wrap ECDSA signing subnets and keys held into iDKG and pass them to execution layer
  • 2d9dc6710 Networking: axum metrics exporter
  • 9955c24ce Node: Consolidate setup-ssh-account-keys
  • 69112f9df Node: Bring manageboot.sh scripts closer in line
  • 9e8efce13 Node: Consolidate fetch-property.sh
  • 76fa5d119 Node: Organize metrics-proxy under monitoring/ component
  • 69422fbf5 Runtime,Execution: Switch system api impl to usize args

Refactoring:

  • d2f77b88a Crypto: split IDkgComputeSecretSharesInternalError
  • c020b9cc1 Crypto: make extracting the node id from the tls cert agnostic to rustls types
  • d87c3cab3 Execution,Runtime: Use a single query execution service for both user and anonymous queries
  • 6742513a6 Networking: improve the documentation for the StateSync/P2P api and rename some methods
  • be08a7718 Networking,Crypto: add external deps and adjust naming

Tests:

  • 5183b96ee Crypto: change some test code
  • c2ae94708 Execution: Cleanup formatting iDKG keys in tests
  • ce2222b6c IDX,Networking: add testonly to crypto test utils and adjust the dependents
  • ce82b5e26 Message Routing,Crypto(crypto): make tests and benchmarks in //rs/certification/… reproducible
  • 9e637ff67 Networking: disable jaeger outside system tests
  • 839b98b82 Networking(http-endpoint): Test that the http/1.1 ALPN header is set.
  • a5e0b84b2 Node: Update bare-metal-test IP addresses
  • 618441d6b Node,DRE: remove the unused directory /testnet/tests/

Other changes:

  • 1a83813dc Boundary Nodes,Crypto,Execution,Runtime,Networking,Message Routing: add quinn-udp external dep in preparation of the rustls upgrade and bump the versions of some core external deps
  • c33b1c7eb Execution,Consensus: Query Stats Empty Stats count metrics
  • f1d7facf8 Execution,Message Routing,Interface: Drop long-since deprecated CanisterQueues::input_schedule proto field
  • 3b7cf1031 Node: Add new LN1 IPv6 /56 subnet
  • 19ec57733 Node: Updating container base images refs [2024-05-23-0826]
  • 1b9b3c5dc Runtime,Execution: Add stable memory usage metrics and api_type info

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/b9a0f18dd5d6019e3241f205de797bca0d9cc3f8/gitlab-ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c b9a0f18dd5d6019e3241f205de797bca0d9cc3f8

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

4 Likes

Thanks Dfinity team.

Can I ask if you’re able to provide an estimate on how long it’s likely to take to address the XNET issue (and when we should expect the temporary increase to the max number of allowed connections to be reverted)? It seems this increases potential for DoS / resource exhaustion in the meantime.

On a separate note, changes to improve test reproducibility seem good, but this proposal also seems to include changes that could theoretically reduce test reproducibility (5183b96ee which switches use of fixed/deterministic UNIX_EPOCH in favour of non-deterministic SystemTime::now()). I’m not sure why this is being done (the commit message didn’t make it clearer - ‘change some test code’ - the merge commit is clearer, but still focused on the what rather than the why) so I thought I’d go ahead and ask. :slightly_smiling_face:

1 Like

You’re right that there’s not a lot of context provided in the original commit message. The corresponding merge commit that applies the changes to the master branch tends to provide more context (in this case it’s this one). As I understand it there’s an issue with inter-subnet comms hitting the current limit, so it’s being temporarily increased until the issue can be resolved. Some further questions:

  • In a worst case scenario, what happens if the 1000 limit is also expended by the XNET issue? What are the downsides with leaving it as it is and fast-tracking the XNET fix (rather than accommodating the issue in the meantime).

  • More generally, would it be worth adjusting proposal summaries so that they reference the merge commits instead of the original commit, given that these tend to have far more informative commit messages that the community will benefit from having easy access to (as evidenced by DGDG’s question)

1 Like

Reviewers for the CodeGov project have completed our review of this replica update.

Proposal ID: 130134
Vote: ADOPT
Full report on OpenChat

At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.

We had several very good reviews of the Release Notes on these proposals by @Zane, @cyberowl, @ZackDS, @massimoalbarello, @ilbert, @hpeebles, and @Lorimer. The IC-OS Verification was also performed by @tiago89. I recommend folks take a look and see the excellent work that was performed on these reviews by the entire CodeGov team. Feel free to comment here or in the thread of each respective proposal in our community on OpenChat if you have any questions or suggestions about these reviews.

2 Likes

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 130400.

Here is a summary of the changes since the last release:

Release Notes for release-2024-05-29_23-02-hotfix-nns (42284da596a2596361f305b8d6d6097b0f40e6d6)

Changelog since git revision b9a0f18dd5d6019e3241f205de797bca0d9cc3f8

Bugfixes:

  • 42284da59 Interface: Disable new storage layer

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/42284da596a2596361f305b8d6d6097b0f40e6d6/gitlab-ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c 42284da596a2596361f305b8d6d6097b0f40e6d6

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

5 Likes

@DRE-Team from the duplicated proposals which one will you reject first or second ? Thanks

3 Likes

The two proposals https://dashboard.internetcomputer.org/proposal/130399 and https://dashboard.internetcomputer.org/proposal/130400 are identical. We submitted two proposals accidentally since the NNS subnet wasn’t confirming that the proposal was submitted so we were submitting blind.

DFINITY plans to vote:
https://dashboard.internetcomputer.org/proposal/130399 REJECT and
https://dashboard.internetcomputer.org/proposal/130400 ADOPT, to be on the safe side.

6 Likes

I’m not even sure what happened here, other than that the LSMT feature (new storage layer) is involved. Here’s some related discussion if you’re interested. :slight_smile:

I expect DFINITY will provide a detailed post-mortem when they have time.

(this post was in response to a question from DGDG that has since been deleted, asking if I’d anticipated this issue)

2 Likes

Reviewers for the CodeGov project have completed our review of these replica updates.

Proposal ID: 130399
Vote: REJECT
Full report on OpenChat
NOTE: The CodeGov followed the lead of DFINITY regarding which of these identical proposals to reject. Thank you for letting us know your plans.

Proposal ID: 130400
Vote: ADOPT
Full report on OpenChat

At the time of this comment on the forum there are still 2 days left in the voting period, which means there is still plenty of time for others to review the proposal and vote independently.

3 Likes

Apologies for the weak communication here. There will be a postmortem yes, and the communication will be an important part of the discussion in the post mortem (internally at least).
My view is basically that the communication especially in such cases needs to be superb and yet the very same engineers who are involved in the technical process of the recovery cannot be responsible for the communication as well since they will be occupied by the recovery.
So we clearly need to assign a separate person to do communication. We’ll have to discuss who can do this and how.

Note that this is my person view, not the official view of the Foundation.

Communication aside, what happened here was that the performance of the nns subnet was periodically getting bad. It started happening once per day from Tuesday, for a short time, then on Thursday it got bad for maybe 20 min (technical details will be in the post mortem), and on yhen Friday we again thought it would be again short but it actually got so bad that it couldn’t recover for multiple hours because it couldn’t even handle regular ingress traffic.

We prepared a new build that has the new storage layer disabled and then I’ve been trying to submit but the nns subnet was not responding after the submission so I ended up submitting two proposals. Only one would be able to be executed due to the invariants on the registry canister but we prefer to be on the safe side (we haven’t recently exercised this invariant in real life, just in tests), so we voted reject on one.

Disabling the new storage layer solved the problem. The engineers have already improved the performance of the new storage layer but it hasn’t been thoroughly tested so we’ll probably skip the nns subnet in the next rollout, to be on the safe side. We’ll share more updates.

In parallel we have been having discussions internally for the nns subnet fault tolerance over the last couple of months. This event might result in the prioritization of that work. We’ll see.

5 Likes

Thank you @wpb, your assistance, help, and support, are much appreciated!

4 Likes