Proposal to elect new release rc--2024-10-03_01-30

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 133309.

Here is a summary of the changes since the last release:

Release Notes for release-2024-10-03_01-30-base (d2657773d007e1b4c0b2dd715c628d24c0d7b5fb)

This release is based on changes since release-2024-09-26_01-31-base (35153c7cb7b9d1da60472ca7e94c693e418f87bd).

Please note that some commits may be excluded from this release if they’re not relevant, or not modifying the GuestOS image. Additionally, descriptions of some changes might have been slightly modified to fit the release notes format.

To see a full list of commits added since last release, compare the revisions on GitHub.

Features:

  • ebe9a6230 Execution,Interface: Charge idle canisters for full execution (#1806)
  • fcb719280 Execution,Interface: Charge canisters for full execution (#1782)
  • 15c174d21 Execution,Interface: Limit backtrace visibility (#1624)
  • 8596e9813 Execution,Interface,Message Routing: Keep track of shed inbound responses (#1173)
  • 1a1c213f3 Execution,Interface,Networking: Increase install_code limit for application subnets (#1705)
  • 6cb46aac8 Interface(sns-cli): Add sns health command (#1711)
  • 735935aa2 Interface,Networking: Introduce p2p slot table limit and limit allowed ingress slots per peer (#1213)
  • 87ed92725 Node: Upgrade GuestOS to 24.04 (#938)
  • 47590772d Node: Upgrade HostOS to 24.04 (#1588)
  • 09ddd7d5b Node: Change monitoring strategy for GuestOS VM (#1586)

Bugfixes:

  • 60f1d5562 Execution,Interface: Cap ingress induction debit for cleanup callback (#1777)
  • ba5ffe01a Execution,Interface: Fix full execution round definition (#1772)
  • d2657773d Execution,Interface,Networking: Tweak instruction overhead per canister (#1819)
  • a9ebaa9e9 Interface,Networking: use OnceCell to store nns certificate delegation and use it in https outcalls transform function (#875)
  • 77dc52029 Node: query_nns_nodes bug (#1665)

Chores:

  • e773cf5df Consensus,Interface(consensus): avoid recomputing the block hash when notarizing a block (#1726)
  • c972dc928 Consensus,Interface: Remove unused pool reader functions (#1721)
  • 9fe63e2f7 Crypto,Interface(crypto): Clean up BIP340 signature processing (#1233)
  • 726cb686a Execution,Interface: Apply priority credit at the round start (#1736)
  • 286f2cbbe Execution,Interface: Update comments (#1739)
  • fa2329782 Execution,Interface,Message Routing: Drop CanisterQueue::QueueItem proto, part 1 (#1797)
  • f8f2d84f3 Execution,Interface,Message Routing: Drop old canister queue implementations (#1733)
  • 6ed86361e Interface: duplicate btc header validation to main repo #769 (#1766)
  • 0161abba3 Interface: move the xnet endpoint under rs/http_endpoints and share ownership with the NET team (#1762)
  • 3bbabefb7 Interface(Ledger-Suite): move icp and icrc ledger suites (#1682)
  • 42f2bd3d4 Interface: boundary nodes massive cleanup (#1771)
  • e2cb3d638 Interface: upgrade prost and tonic crates (#1738)
  • 9c08b9984 Interface: Implement saturating sub for AmountOf (#1740)
  • f7791372e Interface: remove old hyper and bump prost and tonic versions (#1597)
  • d66fdcb4c Interface: bump rust version to 1.81 (#1645)
  • c39a8b35b Interface,Message Routing: Refactor list_state_heights and make it an associated method (#1690)
  • a4e281d92 Interface,Message Routing: use the local config for determing the Socket addr of the xnet server (#1372)
  • 91d8f93ed Interface,Message Routing: upgrade hyper in xnet and use http 2 (#1506)
  • d9ae74c7d Interface,Networking: remove the is_beyond_last_checkpoint check when serving requests (#1643)
  • 6a2eca082 Interface,Networking: Fix stale doc for enabled sync v3 endpoint (#1704)
  • 0279b0f4f Interface,Node: upgrade clap (#1763)
  • a34cbd96a Interface,Node: Remove ipv6_address and make ipv6_prefix required in config tool (#1684)
  • 90ad56b73 Owners(IDX): Upgrade bazel to 7.3.1 (#1695)
  • 10b880941 Node: Update Base Image Refs [2024-10-01-1619] (#1783)
  • 3929437f7 Node: Update Base Image Refs [2024-09-30-2122] (#1759)

Refactoring:

  • afad27aa2 Consensus,Interface: improve docs and methods names in the p2p interface (#1465)
  • 54c3542bc Execution,Interface: Move ongoing_long_install_code into drain_subnet_queues (#1761)
  • 3221c5936 Execution,Interface,Message Routing: Typed canister queues and references (#1697)
  • 41a9d9db7 Interface,Node: refactor os_tools and networking code (#1666)
  • 37b9754a8 Owners(IDX): rename merge base env var for candid checks (#1696)

Tests:

  • 5b4a6e3a5 Execution,Interface: Future proof canister snapshots (#1677)

Full list of changes (including the ones that are not relevant to GuestOS) can be found on GitHub.

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/d2657773d007e1b4c0b2dd715c628d24c0d7b5fb/ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c d2657773d007e1b4c0b2dd715c628d24c0d7b5fb --guestos

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

While not required for this NNS proposal, as we are only electing a new GuestOS version here, you have the option to verify the build reproducibility of the HostOS by passing --hostos to the script above instead of --guestos, or the SetupOS by passing --setupos.

2 Likes

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 133310.

Here is a summary of the changes since the last release:

Release Notes for release-2024-10-03_01-30-revert-ubuntu-22-04 (1ff0e709f0d0984a4f9ab06456db177c4b6e48a0)

This release is based on changes since release-2024-10-03_01-30-base (d2657773d007e1b4c0b2dd715c628d24c0d7b5fb).

Please note that some commits may be excluded from this release if they’re not relevant, or not modifying the GuestOS image. Additionally, descriptions of some changes might have been slightly modified to fit the release notes format.

To see a full list of commits added since last release, compare the revisions on GitHub.

Other changes:

  • 1ff0e709f Node: Revert “feat: Upgrade GuestOS to 24.04 (#938)”

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/1ff0e709f0d0984a4f9ab06456db177c4b6e48a0/ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c 1ff0e709f0d0984a4f9ab06456db177c4b6e48a0 --guestos

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

While not required for this NNS proposal, as we are only electing a new GuestOS version here, you have the option to verify the build reproducibility of the HostOS by passing --hostos to the script above instead of --guestos, or the SetupOS by passing --setupos.

1 Like
  • bf83b7081 Interface,Message Routing: List all PageMaps function in CheckpointLayout (#1779)

was omitted from the notes because the code changes are still not actively used in the code. new functionality is only used in tests. cc @stefan.schneider

2 Likes

Ok folks here it goes: Voted to reject.

Proposal 133309
AMD machine :

Intel machine :

Really hate to ping ppl (including DRE-Team) on weekend, so will wait for others to comment. On the bright side at least the GuestOS is a match. Will follow this with a review of the listed commits.
2 B edited.

Proposal 133310


Same with the only change being the rollback of the previous Upgrade GuestOS to 24.04 .

Update: tried more than 15 times to get at least one successful build and it failed every time.
Ubuntu 22.04 Desktop and server as well as Ubuntu 24.04 Desktop and server all 4 clean installs and even leaving the cache from the previous build (that I can still build without errors btw) did not help. The only thing in common on the following 3 machines was the ISP :upside_down_face: .

  1. Desktop PC Asus Expert Center D9 with Intel Core i7-11700, 32GB DDR4, 512GB NVMe SSD .
  2. Laptop LENOVO IdeaPad Slim 3 with Intel Core i5-12450H , 16GB DDR4, 1TB NVMe SSD
  3. Desktop custom built AM4 platform with AMD Ryzen 9-5950x, 4x16=64GB DDR4, 2 TB NVMe SSD.
    Last weeks build :

    Then :

@DRE-Team could it be that you guys use the same machine when testing the build ? Thanks.

3 Likes

I have reviewed all commits listed in this proposal and in my opinion they all look fine, I have also run the build verification script which completed successfully, so I have voted to adopt the proposal.

Full review:

Features:

ebe9a6230 Execution,Interface: Charge idle canisters for full execution (#1806)
Review: Looks fine + matches description
Notes: Treats idle canisters which were processed in a round as fully executed so that the scheduler lowers their priority in future rounds, making the scheduler fairer especially when there are lots of idle canisters.

fcb719280 Execution,Interface: Charge canisters for full execution (#1782)
Review: Looks fine + matches description
Notes: At the end of each round, charges canisters which were fully executed (they have no more messages or were scheduled first on a core).

15c174d21 Execution,Interface: Limit backtrace visibility (#1624)
Review: Looks fine + matches description
Notes: Ensures the internals of a canister aren’t leaked by removing backtraces from errors if the caller doesn’t have permission to view the canister logs.

8596e9813 Execution,Interface,Message Routing: Keep track of shed inbound responses (#1173)
Review: Looks fine + matches description
Notes: Whenever an inbound response message is dropped its Id and CallbackId get added to the new shed_responses map. This is then used when processing the inbound message queue to build an appropriate error response given that the message was dropped.

1a1c213f3 Execution,Interface,Networking: Increase install_code limit for application subnets (#1705)
Review: Looks fine + matches description
Notes: I asked for this! It bumps the instruction limit on Application subnets for install_code messages to 300B, matching the instruction limit on “Verified Application” subnets.

6cb46aac8 Interface(sns-cli): Add sns health command (#1711)
Review: Looks fine + matches description
Notes: Adds the health command to the SNS CLI tool which outputs stats of each SNS (any canisters low on memory | any canisters low on cycles | number of SNS canister upgrades remaining).

735935aa2 Interface,Networking: Introduce p2p slot table limit and limit allowed ingress slots per peer (#1213)
Review: Looks fine + matches description
Notes: Adds a configurable limit on how many artifacts of a given type the consensus manager may contain at a time from each peer, then sets this limit to 50,000 for ingress messages.

87ed92725 Node: Upgrade GuestOS to 24.04 (#938)
Review: Looks fine + matches description
Notes: Bumps the GuestOS base images to versions built using Ubuntu 24.04 then makes other minor changes required to work with the new versions.

47590772d Node: Upgrade HostOS to 24.04 (#1588)
Review: Looks fine + matches description
Notes: Bumps the HostOS base images to versions built using Ubuntu 24.04 then makes other minor changes required to work with the new versions.

09ddd7d5b Node: Change monitoring strategy for GuestOS VM (#1586)
Review: Looks fine + matches description
Notes: Switches the monitor-guestos.sh service to using virsh to manage the liveness of the GuestOS VM.

Bugfixes:

60f1d5562 Execution,Interface: Cap ingress induction debit for cleanup callback (#1777)
Review: Looks fine + matches description
Notes: If a canister traps, it then runs a cleanup callback. This change ensures that there are always enough cycles to run the cleanup callback by cancelling ingress message induction charges if the canister wouldn’t otherwise have enough cycles to process the callback.

ba5ffe01a Execution,Interface: Fix full execution round definition (#1772)
Review: Looks fine + matches description
Notes: Fixes the criteria under which the last_full_execution_round field of a canister is set.

d2657773d Execution,Interface,Networking: Tweak instruction overhead per canister (#1819)
Review: Looks fine + matches description
Notes: Reduce the number of instructions charged to spin up a new canister sandbox process from 8M to 4M.

a9ebaa9e9 Interface,Networking: use OnceCell to store nns certificate delegation and use it in https outcalls transform function (#875)
Review: Looks fine + matches description
Notes: Switches from using RwLock to OnceCell for passing the NNS certificate delegation around and fixes a bug where the certificate wasn’t being passed to the https outcalls transform function.

77dc52029 Node: query_nns_nodes bug (#1665)
Review: Looks fine + matches description
Notes: Fixes 2 bugs within the assemble_nns_nodes_list function. The first is to make the metric argument optional under certain conditions and the 2nd is to make the query_nns_nodes function return an error if the NNS_URL_LIST is empty.

Chores:

e773cf5df Consensus,Interface(consensus): avoid recomputing the block hash when notarizing a block (#1726)
Review: Looks fine + matches description
Notes: When notarizing a block proposal, grab the block hash from the proposal content rather than calculating the hash itself.

c972dc928 Consensus,Interface: Remove unused pool reader functions (#1721)
Review: Looks fine + matches description
Notes: Removes some unused code.

9fe63e2f7 Crypto,Interface(crypto): Clean up BIP340 signature processing (#1233)
Review: Looks fine + matches description
Notes: Simplifies the BIP340 signature processing by removing the handling around the y coordinate of the public key being odd because it turns out this special handling isn’t needed.

726cb686a Execution,Interface: Apply priority credit at the round start (#1736)
Review: Looks fine + matches description
Notes: Calls apply_priority_credit at the round start rather than having to remember the data and then apply it at the end of the round.

286f2cbbe Execution,Interface: Update comments (#1739)
Review: Looks fine + matches description
Notes: Just updates some comments.

fa2329782 Execution,Interface,Message Routing: Drop CanisterQueue::QueueItem proto, part 1 (#1797)
Review: Looks fine + matches description
Notes: This is the first step to move towards a simpler proto format for the CanisterQueues. This step introduces a new version of the queues field and deprecates the old version, it then handles deserialization from either field, in a subsequent release the old field will be removed.

f8f2d84f3 Execution,Interface,Message Routing: Drop old canister queue implementations (#1733)
Review: Looks fine + matches description
Notes: Removes the code for the old canister queue implementation since everything is now switched over to the new implementation.

6ed86361e Interface: duplicate btc header validation to main repo #769 (#1766)
Review: Looks fine + matches description
Notes: Copies validation code from the bitcoin-canister repo into the IC repo to remove the dependency on the ic-btc-validation crate, allowing the 2 repos to update their bitcoin dependencies independently.

0161abba3 Interface: move the xnet endpoint under rs/http_endpoints and share ownership with the NET team (#1762)
Review: Looks fine + matches description
Notes: Simply moves and renames from ic-xnet-endpoint to ic-http-endpoints-xnet.

3bbabefb7 Interface(Ledger-Suite): move icp and icrc ledger suites (#1682)
Review: Looks fine + matches description
Notes: Moves the ICP and ICRC ledgers into the new ledger_suites directory.

42f2bd3d4 Interface: boundary nodes massive cleanup (#1771)
Review: Looks fine + matches description
Notes: Removes a load of code which is no longer used from the boundary nodes.

e2cb3d638 Interface: upgrade prost and tonic crates (#1738)
Review: Looks fine + matches description
Notes: Bumps prost from 0.13.2 to 0.13.3 and tonic from 0.12.0 to 0.12.3 + bumps a few other dependencies + replaces many type names with Self.

9c08b9984 Interface: Implement saturating sub for AmountOf (#1740)
Review: Looks fine + matches description
Notes: Simplifies by using saturating_sub to calculate min_height rather than performing the bounds checks manually.

f7791372e Interface: remove old hyper and bump prost and tonic versions (#1597)
Review: Looks fine + matches description
Notes: Consolidates hyper dependency version by using the workspace version and bumps prost to 0.13.2 and tonic to 0.12.2.

d66fdcb4c Interface: bump rust version to 1.81 (#1645)
Review: Looks fine + matches description
Notes: Bumps the Rust version from 1.80.0 to 1.81.0.

c39a8b35b Interface,Message Routing: Refactor list_state_heights and make it an associated method (#1690)
Review: Looks fine + matches description
Notes: Removes list_state_heights from the StateManager trait and makes it an associated method of StateManagerImpl.

a4e281d92 Interface,Message Routing: use the local config for determing the Socket addr of the xnet server (#1372)
Review: Looks fine + matches description
Notes: Avoids calling in to the Registry when starting up the Xnet server and instead reads all the settings from the local config.

91d8f93ed Interface,Message Routing: upgrade hyper in xnet and use http 2 (#1506)
Review: Looks fine + matches description
Notes: Fairly big change to the Xnet endpoint to make it use HTTP2 and update its code to work similarly to some of the other servers (eg. http outcall adapter and the bitcoin adapter).

d9ae74c7d Interface,Networking: remove the is_beyond_last_checkpoint check when serving requests (#1643)
Review: Looks fine + matches description
Notes: Removes the unnecessary is_beyond_last_checkpoint check from the Bitcoin adapter.

6a2eca082 Interface,Networking: Fix stale doc for enabled sync v3 endpoint (#1704)
Review: Looks fine + matches description
Notes: Fixes a comment now that the value contains the list of subnets for which the call V3 functionality is disabled, rather than enabled.

0279b0f4f Interface,Node: upgrade clap (#1763)
Review: Looks fine + matches description
Notes: Bumps clap from 4.4.6 to 4.5.18.

a34cbd96a Interface,Node: Remove ipv6_address and make ipv6_prefix required in config tool (#1684)
Review: Looks fine + matches description
Notes: Removes the ipv6_address field from NetworkInfo and NetworkSettings and makes ipv6_prefix required rather than optional.

90ad56b73 Owners(IDX): Upgrade bazel to 7.3.1 (#1695)
Review: Looks fine + matches description
Notes: Bumps Bazel from 7.0.1 to 7.3.1.

10b880941 Node: Update Base Image Refs [2024-10-01-1619] (#1783)
Review: Looks fine + matches description
Notes: Updates the base image references.

3929437f7 Node: Update Base Image Refs [2024-09-30-2122] (#1759)
Review: Looks fine + matches description
Notes: Updates the base image references.

Refactoring:

afad27aa2 Consensus,Interface: improve docs and methods names in the p2p interface (#1465)
Review: Looks fine + matches description
Notes: Renames get_all_validated to get_all_for_broadcast and updates some doc comments.

54c3542bc Execution,Interface: Move ongoing_long_install_code into drain_subnet_queues (#1761)
Review: Looks fine + matches description
Notes: Moves the calculation of ongoing_long_install_code into drain_subnet_queues rather being performed just before calling it, this also means it is only calculated in precisely the scenarios where it is required.

3221c5936 Execution,Interface,Message Routing: Typed canister queues and references (#1697)
Review: Looks fine + matches description
Notes: Makes CanisterQueue generic then defines type aliases for each type of queue (eg. type InputQueue = CanisterQueue<CanisterInput>).

41a9d9db7 Interface,Node: refactor os_tools and networking code (#1666)
Review: Looks fine + matches description
Notes: Validates the optional mgmt_mac and deployment_name input args at the top level rather than within the generate_network_config library function.

37b9754a8 Owners(IDX): rename merge base env var for candid checks (#1696)
Review: Looks fine + matches description
Notes: Renames CI_PULL_REQUEST_TARGET_BRANCH_SHA to MERGE_BASE_SHA.

Tests:

5b4a6e3a5 Execution,Interface: Future proof canister snapshots (#1677)
Review: Looks fine + matches description
Notes: Adds a test that will fail compilation if any fields are added to the canister snapshot state, requiring the developer to see the test and read the comments which detail how to safely add fields.

I have also successfully run the build verification script for 1ff0e709f0d0984a4f9ab06456db177c4b6e48a0 so have voted to adopt their proposals too.

2 Likes

I just want to add that several people on the team have gotten this error and although it might be intermittent it does make me want to proceed with caution before voting to adopt.

4 Likes

It appears that 5 out of 6 team members for CodeGov have received one or more build failures upon multiple attempts. Some have not yet been able to obtain a build success. I’m very curious if DFINITY saw many build failures on this proposal in their tests. I think it has been a while since there have been so many build failures and it would be good to know why the DFINITY tests likely showed deterministic builds while the community builds keep failing.

4 Likes

Thanks for preemptively providing an explanation for this @Luka. Governance-wise, I think there’s a general attack vector if commits like that are allowed to be obscured from GuestOS election proposal change logs. Precedent like this could be used to more easily obscure malicious changes in future commits (by future contributors). Attacks do not need to be actioned by one commit in one release, but can be spread over many releases (until a relatively harmless looking commit can be used to action dormant code). I don’t reject to be intentionally difficult, and I’ve written a whole lot about it before.

Also note that build hash verification was sketchy this week, as mentioned by @ZackDS, @cyberowl and @wpb. Although I could reproduce the GuestOS hash, I couldn’t for HostOS and SetupOS. Given discussions last week, these other hashes are now just as important (even if the current deployment target is GuestOS).

TLDR: I’m voting to reject both proposals.


133309

Build successful, but as mentioned hashes generated on my machine do not entirely match (CDN and local build).

There are 101 commits since the previous release, 46 of which are referenced in this proposal. There are 52 files that have been modified both by commits referenced in this proposal as well as commits that weren’t. Having skimmed through these I can see that at least 1 commit has been ommitted from the GuestOS change log presented in the proposal summary which should not have been.

All commits that I had time review appear to match their commit messages well and seem reasonable. If you're interested in my comments to this effect for every commit, then please expand.

Features:

  • ebe9a6230 Execution,Interface: Charge idle canisters for full execution (#1806 )

    • Scheduler enhancement foreshadowed here. Marks idle canisters as fully executed to rotate the round schedule faster.
  • fcb719280 Execution,Interface: Charge canisters for full execution (#1782 )

    • The changes are consistent with the commit message, which explains the even distribution of points charged across canisters.
  • 15c174d21 Execution,Interface: Limit backtrace visibility (#1624 )

    • Security enhancement to prevent accidental exposure of internal state in error logs
  • 8596e9813 Execution,Interface,Message Routing: Keep track of shed inbound responses (#1173 )

    • Introduces a mechanism to track shed inbound responses. Modifies the CanisterQueues structure to handle these responses and generate appropriate reject responses.
  • 1a1c213f3 Execution,Interface,Networking: Increase install_code limit for application subnets (#1705)

    • Makes application subnets more similar to verified application subnets. MAX_INSTRUCTIONS_PER_INSTALL_CODE is updated from 40 * 5 * B to 300 * B. The verified_application_subnet function now calls Self::application_subnet() instead of setting max_instructions_per_install_code explicitly.
  • 6cb46aac8 Interface(sns-cli): Add sns health command (#1711)

    • Introduces a command to the sns-cli, which checks the health of SNS canisters by evaluating memory consumption, cycles, and remaining upgrade steps
  • 735935aa2 Interface,Networking: Introduce p2p slot table limit and limit allowed ingress slots per peer (#1213)

    • Changes include adding a slot_limit parameter to various functions and structures, updating metrics to track when the slot table limit is exceeded, and modifying logic to enforce the slot limit.
  • 87ed92725 Node: Upgrade GuestOS to 24.04 (#938)

    • See mention of this regarding proposal 133310 below (still has outstanding questions from last week)
  • 47590772d Node: Upgrade HostOS to 24.04 (#1588) + 09ddd7d5b Node: Change monitoring strategy for GuestOS VM (#1586)

    • HostOS upgrade. Again, raises similar questions to the ones last week. Updates Dockerfile, system configurations, and GuestOS monitoring to align with the new OS version, but I’m unclear on some of the changes, such as removing a ‘reproducibility fix’

Bugfixes:

  • 60f1d5562 Execution,Interface: Cap ingress induction debit for cleanup callback (#1777)

    • Addresses an edge case in the cleanup callback related to cycles balance. This change caps the ingress induction debit, ensuring the cleanup callback can complete successfully
  • ba5ffe01a Execution,Interface: Fix full execution round definition (#1772)

    • Refines the definition of a full execution round and ensures that canisters are correctly marked as fully executed
  • d2657773d Execution,Interface,Networking: Tweak instruction overhead per canister (#1819)

    • Instruction overhead per canister is adjusted to better reflect the actual system performance (according to metrics gathered by DFINITY)
  • a9ebaa9e9 Interface,Networking: use OnceCell to store nns certificate delegation and use it in https outcalls transform function (#875)

    • Aims to fix a bug where no certificate was passed to the HTTPS outcalls transform function. The use of OnceCell seems appropriate for a value that is set once and read many times (reducing the overhead of locking)
  • 77dc52029 Node: query_nns_nodes bug (#1665)

    • Improves the robustness of the fetch-property.sh script and the query_nns_nodes function

Chores:

  • e773cf5df Consensus,Interface(consensus): avoid recomputing the block hash when notarizing a block (#1726)

    • modifies the notarize_block function to use a HashedBlock instead of a Block. This allows the function to utilize a precomputed hash (get_hash()) rather than recomputing it
  • c972dc928 Consensus,Interface: Remove unused pool reader functions (#1721)

    • Unused functions get_dkg_payloads and get_replica_version_from_highest_catch_up_package were removed from pool_reader.rs
  • 9fe63e2f7 Crypto,Interface(crypto): Clean up BIP340 signature processing (#1233)

    • Appears to be a simplification that aligns with the BIP340 specification. Removing unnecessary operations reduces the potential for errors and vulnerabilities, so looks good.
  • 726cb686a Execution,Interface: Apply priority credit at the round start (#1736)

    • Simplifies the code
  • 286f2cbbe Execution,Interface: Update comments (#1739)

    • Updates comments to improve code clarity
  • fa2329782 Execution,Interface,Message Routing: Drop CanisterQueue::QueueItem proto, part 1 (#1797) + f8f2d84f3 Execution,Interface,Message Routing: Drop old canister queue implementations (#1733)

    • Removes CanisterQueue::QueueItem proto and old queue implementations and simplifies the representation of canister queues.
  • 6ed86361e Interface: duplicate btc header validation to main repo #769 (#1766)

    • Decoupling repos. Looks reasonable.

… Did not take a look at the remaining commits as I ran low on available time.

I’ve also validated the unelection component of this proposal below.

There currently appear to be 10 blessed replica versions registered, 4 of which would be unelected by this proposal. These unelected versions are not running on any subnets, nor any unassigned nodes, so appears safe to unelect. Expand for details.

I’ve listed these below, ordered by elected date, and crossed out the versions that would be unelected.

  • afe1a18, elected 2024-09-16 (proposal 132481), UNELECTION PROPOSED, running on 0 subnets
  • 1799735, elected 2024-09-16 (proposal 132482), UNELECTION PROPOSED, running on 0 subnets
  • c664899, elected 2024-09-18 (proposal 132547), UNELECTION PROPOSED, running on 0 subnets
  • cacf86a, elected 2024-09-18 (proposal 132548), UNELECTION PROPOSED, running on 0 subnets
  • 0441f40, elected 2024-09-23 (proposal 133061), running on 0 subnets
  • 7f6a81f, elected 2024-09-23 (proposal 133062), running on 1 subnets, and all unassigned nodes (since proposal 133159)
  • c87abf7, elected 2024-09-23 (proposal 133063), running on 0 subnets
  • 35153c7, elected 2024-09-30 (proposal 133142), running on 1 subnets
  • d101161, elected 2024-09-30 (proposal 133143), running on 35 subnets
  • c43a488, elected 2024-09-30 (proposal 133144), running on 0 subnets

133310

Build successful, but as mentioned hashes generated on my machine do not entirely match (CDN and local build).

This proposal is largely the same as 133309 (above), except that this proposal reverts one of 133309’s commits (the GuestOS upgrade to 24.04). Note that this has the same effect as last week, which did this but in reverse (where the second proposal upgraded the GuestOS instead of reverting it). The point is that in both cases there are two elected GuestOS’ that should differ only in that one has the 24.04 upgrade, and the other doesn’t. I say should because →

:point_up: Are you able to chase some commentary on this @DRE-Team?

2 Likes

Proposal: 133309

Summary:

  1. Verified the build hash matches the local build hash, from the CDN and in the proposal Payload and is “9baed2781ac8d14db9b315fd002e92bd2ced0e2bc68a370961dcadd548cab1b3”
  2. Verified the code changes and release notes.

Features:

Ebe9a6230

Verified that the Charge_idle_canisters functionality has been added to charge the canisters for their full execution. Relevant test cases are also added.

Fcb719280

The above commit is extended and the functionality has been added to scheduler, round scheduler, and relevant test cases are updated.

15c174d21

The backtrace has been removed in the event the caller does not have permission to view the canister’s log, and hence the visibility is limited. Code changes in hypervisor.rs match the release notes.

8596e9813

Various changes has been made to CallBackReference, canister queues, and pb_queues. The change set appears to shed the response messages due to memory pressure, but still keep track of them to produce SYS_UNKNOWN

1a1c213f3

Verified that the code changes has increased the max instructions message limit for install_code function from 200B to 300B in rs/config/src/subnet_config.rs

6cb46aac8

Verified that the code changes added a new functionality to check the health of a SNS. It checks and report on the cli if the SNS DAO is missing on upgrades to be current.

735935aa2

Changes are made to unvalidated ingress pool to be bounded to 50K, rather than being unbounded. This is implemented by updating the limits on the slot table. Changes are in file rs/replica/setup_ic_network/src/lib.rs

87ed92725

Verified that the guest os is upgraded to 24.04

47590772d

Verified that the hostos is upgraded to 24.04, along with a minor bump in python version to 3.12
Couple of dependent packages are also added to the host environment such as ssh

09ddd7d5b

The monitoring strategy for guestOS has been modified. Functions like enable and disable guestOS has been removed. And the service type is changed from “forking” to “oneshot”.

Bugfixes:

60f1d5562

Verified the fix for an edge case in a cleanup callback that previously accumulates some cycle balance change when there was an accumulated ingress induction debit.
Remove_charge_from_ingress_induction_cycles_debit function is added and now called from handle_wasm_execution_of_cleanup_callback.

Ba5ffe01a

Full execution round definition has been fixed - checking whether it is a first execution or continuation to installcode and accounting for it.

D2657773d

Instruction overhead per canister has been reduced from 8M to 4M.

A9ebaa9e9

Fix has been made for OneCell to store the nns certificate delegation and later use it in https outcalls. Code changes has been verified in call_v3.rs where RwLock<Option< is replace by OneCell

77dc52029

Two bug fixes has been made to get proper NNS URL String in assemble_nns_nodes_list and fetch_property.

Chores:

E773cf5df

Changes has been made to self.notarize_block function to avoid recomputing of the block hash.

C972dc928

Unused function get_dkg_payloads() has been removed.

9fe63e2f7

A cleanup has been done around BIP340 key generation.

726cb686a

The priority credit which is not part of the serialized state is now applied at the round start. The code is also simplified by taking out the functionality from initialize_inner_round.

286f2cbbe

Verified minor comment updates.

Fa2329782

Removed CanisterQueue::QueueItem proto and replaced with plain numeric, instead of enum.

F8f2d84f3

Removed the older implementation of canister queues which was there for backward compatibility. Since the mainnet is now running a replica version and encoding the new CanisterQueues, the older implementation is removed.

F8f2d84f3

Various changes has been made to bitcoin validation code, Essentially it copies over changes from GitHub - dfinity/bitcoin-canister to ic-os repo. This has been done to decouple both the repos.

0161abba3

The xnet endpoint is moved under http_endpoints.

3bbabefb7

This is a major refactor of adding icp and icrc ledger suites under respective folder.

42f2bd3d4

Massive cleanup has been done to boundary nodes code where icx-proxy, certificate-syncer, denylist-updater and nginx related stuff has been removed.

E2cb3d638

Minor version updates has been made to prost and tonic from 0.13.2 to 0.13.3

9c08b9984

SaturatingSub for AmountOf has been added to amountof.rs ( phantom types has been updated ).

F7791372e

Version upgrades to hyper, prost, and tonic

D66fdcb4c

Verified that rust version has been updated from 1.80 to 1.81

C39a8b35b
Verified code thanges has been made to list_state_heights and made it an associated method.

A4e281d92

Changes has been made to XNetEndpoint config load. It is not being loaded from local config and ip4 address is calculated.

91d8f93ed

handle_xnet_request() start_server() has been added in xnet endpoint service to use http2

D9ae74c7d

Non functional is_beyond_last_checkpoint() has been removed.

6a2eca082

Minor doc change

0279b0f4f

Clap has been upgraded from 4.4.6 to 4.5.18

A34cbd96a

In setup OS, IPv6 address has been removed, instead ipv6 prefix has been added to the config tool.

90ad56b73

Bazel has been upgraded to 7.3.1

10b880941

Updated to ref base image

3929437f7

Update to ref base image

Refactoring:

Afad27aa2

Verified the refactor in p2p interfaces. Certain function have been renamed like get_all_validated to get_all_for_broadcast to add more clarity.

54c3542bc

The code for ongoing_long_install_code variable assignment has been moved to another place.

3221c5936

Strict typing has been introduced. For example CanisterQueue to InputQueue. Always better to catch compile time errors.

41a9d9db7

Refactor has been done in os_tools and networking code, more implementation has been done to generated_mac_address and functionality like Unformatted and formatted mac address display has been added.

37b9754a8

Changes has been made to CI flow improve clarity. MERGE_BASE_SHA has been added.

Tests:

5b4a6e3a5

Tests has been added to future proof canistersnapshot.

Proposal: 133310

Summary:

  1. Verified the build hash matches the local build hash, from the CDN and in the proposal Payload and is “b87f9bcd173ab16842090b90b2fadf96ee8eb22df71ea33755684210f3297201”

Other changes:

1ff0e709f

Verified this a revert of commit made in Proposal 133309 for guest os upgrade.

Proposal 133309

The local, CDN and the payload hash matches.

Features

[ebe9a6230]
Functionality added to charge idle canisters for full execution.

[fcb719280]
Applies charges for full executed canisters and the points charged is distributed evenly across canisters.

[15c174d21]
If the caller doesn’t have access to canister logs this change removes the backtrace from error message so that the canisters doesn’t accidentally reveal internal state or code.

[8596e9813]
Inbound message response is dropped from the pool but the CallbackId is stored produce a SYS_UNKNOWN reject response.

[1a1c213f3]
increases the install_code message limit for application subnets from 200B to 300B

[6cb46aac8]
Adds health command to the sns cli
bazel run :sns --config=local -- health

[735935aa2]
Introduces a option config to bound slot table and a hard bound for the ingress pool of 50k

[87ed92725]
Upgrades guest os to 24.04

[47590772d]
Upgrades host os to 24.04

[09ddd7d5b]
Changes to introduce virsh to manage GuestOS vm.

Bugfixes

[60f1d5562]
This change caps the ingress induction debit by the amount removed from the cycles balance during execution. This ensures that there are always enough cycles to runt he cleanup callback.

[ba5ffe01a]
Changes match the description. Implements changes to fix the full execution round definition.
A. When the canister is first scheduled to CPU core.
B. when the canister have nothing to execute or canister have executed all the instructions.

[d2657773d]
reduces instruction overhead from 8M to 4M

[a9ebaa9e9]
Bugfix to use OnceCell that is shared between different consumers when certificate is not passed to the http callback.

[77dc52029]
Fixes to bugs
A. updates fetch-property to make metric field optional is metrics.sh does not exists.
B. refactor query_nns_nodes to fail if NNS_URL_LIST is empty

Chores

[e773cf5df]
This changes avoids recomputing the block has when notarizing a block

[e773cf5df]
unused functions have been removed

[9fe63e2f7]
Cleans up BIP340 signature processing by removing the code to explicitly flip the public or
private keys during signature generation.

[726cb686a]
simplifies code by applying priority credit at the beginning of the round.

[286f2cbbe]
Just some comment updates

[fa2329782]
dropped CanisterQueue::QueueItem usage and started using plain numeric

[f8f2d84f3]
Removed old canister queues since the mainnet is running replica version and using the new CanisterQueues.

[6ed86361e]
copies validation code from bitcoin-canister repository in the ic repository to remove dependency so the bitcoin crate can be updated independently.

[0161abba3]
xnet endpoint is moved to http_endpoints, change match the description

[3bbabefb7]
A. move the icp ledger suit to it’s subdirectory
B. move the icrc1 ledger suit to it’s subdirectory

[42f2bd3d4]
Removes code for icx-proxy, certificate-syncer, denylist-updater, nginx related stuff which is no longer needed.

[e2cb3d638]
upgrade prost from 0.13.2 to 0.13.3 and tonic from 0.12.2 to 0.12.3

[9c08b9984]
Added implementation of saturating_sub and removed min_height code block

[f7791372e]
bumps prost and tonic versions

[d66fdcb4c]
upgrades rust version to 1.81

[c39a8b35b]
This PR makes following changes:
A. removes checkpoint_heights from the result of list_state_heights
B. removes list_state_heights from StateReader trait and makes it an associated method

[a4e281d92]
updates to use registry only for discovery and not to start local servers.

[91d8f93ed]
upgrades hyper in xnet to use http2, description matches the code changes.

[d9ae74c7d]
This change removes the is_beyond_last_checkpoint check.

[6a2eca082]
Small comment fixes

[0279b0f4f]
upgrades clap from 4.4.6 to 4.5.18

[a34cbd96a]
removes ipv6_address and makes ipv6_prefix required instead of optional.

[90ad56b73]
upgrades bazel from 7.0.1 to 7.3.1

[10b880941]
updates base container image references.

[3929437f7]
updates base container image references.

Refactoring

[afad27aa2]
updates get_all_validated method to get_all_for_broadcast and updates some comments

[54c3542bc]
no functional changes moves ongoing_long_install_code to drain_subnet_queues.

[3221c5936]
Added typing to canister queues to ensure that input queues only have inbound references and output queues have only outbound references.

[41a9d9db7]
code refactoring for os_tools and networking code. changes match the description.

[37b9754a8]
renames CI_PULL_REQUEST_TARGET_BRANCH_SHA environment variable to MERGE_BASE_SHA

Tests

[5b4a6e3a5]
Adds test cases to require developers while making changes to canister state to think about edge cases and repercussions to the canister snapshot logic

Proposal 133310

The local, CDN and the payload hash matches.

[1ff0e709f]
revert of commit 87ed927 in proposal 133309

Proposal 133309

The local, CDN, and payload hashes are all consistent and match as expected.
Voting to adopt

Features

  • [ebe9a6230]: Introduced a feature to charge idle canisters for their complete execution time.
  • [fcb719280]: Implemented a balanced approach to evenly distribute charges across fully executed canisters.
  • [15c174d21]: Removed backtrace details from error messages when the caller lacks access, preventing accidental exposure of internal canister states.
  • [8596e9813]: Modified inbound message responses—dropped from the pool while storing the CallbackId to produce a SYS_UNKNOWN reject response.
  • [1a1c213f3]: Increased the install_code message limit for application subnets from 200 bytes to 300 bytes.
  • [6cb46aac8]: Added a health check command for the sns CLI (bazel run :sns --config=local – health).
  • [735935aa2]: Introduced configuration options to bound slot tables and set a hard limit of 50k for the ingress pool.
  • [87ed92725]: Upgraded GuestOS to Ubuntu 24.04.
  • [47590772d]: Upgraded the HostOS to Ubuntu 24.04.
  • [09ddd7d5b]: Added virsh support to manage GuestOS VMs.

Bug Fixes

  • [60f1d5562]: Capped ingress induction debit to ensure sufficient cycles remain for cleanup callbacks during execution.
  • [ba5ffe01a]: Refined the definition of a full execution round, covering when canisters start on a CPU core and when they run out of instructions or tasks.
  • [d2657773d]: Reduced instruction overhead from 8M to 4M.
  • [a9ebaa9e9]: Fixed an issue with OnceCell for shared consumers when certificates aren’t passed to HTTP callbacks.
  • [77dc52029]: Minor fixes to fetch-property and query_nns_nodes for better error handling.

Chores

  • [e773cf5df]: Avoided redundant block hash recomputation during notarization.
  • [9fe63e2f7]: Cleaned up BIP340 signature processing by removing unnecessary key flipping.
  • [726cb686a]: Simplified round processing by applying priority credits early.
  • [286f2cbbe]: Made minor comment updates.
  • [fa2329782]: Replaced CanisterQueue::QueueItem with a simple numeric type.
  • [f8f2d84f3]: Removed old canister queues since the mainnet now uses new CanisterQueues.
  • [6ed86361e]: Copied validation code from bitcoin-canister into the IC repo to remove dependencies.
  • [0161abba3]: Moved xnet endpoint to http_endpoints.
  • [3bbabefb7]: Reorganized ICP and ICRC1 ledger code into separate subdirectories.
  • [42f2bd3d4]: Removed obsolete components (icx-proxy, certificate-syncer, and nginx).
  • [e2cb3d638]: Upgraded prost to 0.13.3 and tonic to 0.12.3.
  • [9c08b9984]: Implemented saturating_sub and removed the min_height code.
  • [f7791372e]: Updated prost and tonic versions.
  • [d66fdcb4c]: Upgraded Rust version to 1.81.
  • [c39a8b35b]: Cleaned up list_state_heights in StateReader trait.
  • [a4e281d92]: Now using registry for discovery only, avoiding local server start-ups.
  • [91d8f93ed]: Upgraded hyper in xnet to support HTTP/2.
  • [d9ae74c7d]: Removed is_beyond_last_checkpoint check.
  • [6a2eca082]: Minor comment corrections.
  • [0279b0f4f]: Upgraded clap from 4.4.6 to 4.5.18.
  • [a34cbd96a]: Removed ipv6_address and made ipv6_prefix mandatory.
  • [90ad56b73]: Upgraded Bazel to 7.3.1.
  • [10b880941], [3929437f7]: Updated base container image references.

Refactoring

  • [afad27aa2]: Renamed get_all_validated to get_all_for_broadcast and clarified comments.
  • [54c3542bc]: Moved ongoing_long_install_code to drain_subnet_queues.
  • [3221c5936]: Typed canister queues to ensure correct queue references.
  • [41a9d9db7]: Refactored os_tools and networking code for clarity.
  • [37b9754a8]: Renamed CI_PULL_REQUEST_TARGET_BRANCH_SHA to MERGE_BASE_SHA.

Tests

  • [5b4a6e3a5]: Added tests to ensure canister state changes consider edge cases and canister_snapshot logic.

Proposal 133310

The local, CDN, and payload hashes are all consistent and match as expected.
Voting to adopt

  • [1ff0e709f]: Reverted commit 87ed927 from proposal 133309.

Proposal 133309

Commits match but I couldn’t build the replica successfully. Voted to reject.

[ebe9a6230] Added charge_idle_canisters method which iters over the round schedule queue and marks all idle canister as fully executed until it finds a non-idle canister. The method is then called inside inner_round. fully_executed_canister_ids type has also been changed from a Vec of canister IDs to a BTreeSet.

[fcb719280] Added finish_canister_execution method which inserts all canisters that were either fully executed, i.e whose message queue has been exhausted, have finished an execution slice or have been scheduled first into fully_executed_canister_ids vector. This vector is then passed to finish_round method which updates the priority credit of executed canisters and updates the accumulated priority of all canisters by dividing equally the remaining allocation.

[15c174d21] Remove backtrace data from HypervisorError if the caller is not authorized to access canister logs.

[8596e9813] CanisterQueues struct now has a shed_responses field, this is a map of inbound best effort responses that have been shed, containing the CallbackID indexed by the id of the response in the message pool, this way messages can be dropped from the canisterqueues but still produce reject responses when calling peek_input() or pop_input().

[1a1c213f3] Increased MAX_INSTRUCTIONS_PER_INSTALL_CODE limit for application subnets from 40B to 300B.

[60f1d5562] Ensure the cleanup callback always run by checking whether the canister can afford to pay for ingress_induction_cycles_debit on top of the cycles paid during execution, if the amount owed is greater than the balance, the removed cycles are subtracted from the ingress_induction_cycles_debit.

[ba5ffe01a] Define full execution round as either having executed all canister messages, being the first canister scheduled in a round or a canister which has completed an execution slice.

[d2657773d] Reduced INSTRUCTION_OVERHEAD_PER_CANISTER constant from 8M instructions to 4M.

[726cb686a] Apply priority credit to canisters in apply_scheduling_strategy directly instead of having to track non_zero_priority_credit_canister_ids and update the priority during round execution.

[286f2cbbe] Updated comments of drain_subnet_queues.

[fa2329782] Start the process to deprecate old canister queues type based on enum types in favour of directly storing the numeric reference. The old type has been renamed to deprecated_queue and queue now uses the new type, i.e u64. Code has been added to convert between the two versions.

[f8f2d84f3] Removed old canister queue implementations and protobuf definitions.

[c39a8b35b] Removed list_state_heights from StateManager trait and instead added it as an associated method of StateManagerImpl.

[a4e281d92] Use local config for determine Socket addr when starting the replica instead of retrieving it from the registry.

[d66fdcb4c] Bump rust version to 1.81

[54c3542bc] Slightly simplified inner_round by moving ongoing_long_install_code processing to drain_subnet_queues method.

[3221c5936] Distinguish canister queues and references based on their type, i.e Inbound/Outbound, so that only the correct message type can be inserted.

[5b4a6e3a5] Added canister_snapshot_change_guard_do_not_modify_without_reading_doc_comment as failsafe to prevent accidental SystemState changes breaking the canister snapshots.

Proposal 133310

Commits match but I couldn’t build the replica successfully. Voted to reject.

2 Likes

proposal - 133309


Vote: REJECT (Intermittent hash error)

Hash: Intermittent Error

Feedback: NONE

Features

[ebe9a6230]
Switch from using a Vec<CanisterId> to a BTreeSet<CanisterId> for fully_executed_canister_ids ensures that canisters are stored without duplicates and in sorted order, which is particularly important with the introduction of charge_idle_canisters. This new method charges idle canisters early in the round, and the BTreeSet prevents these canisters from being added multiple times during the different stages of execution.

[fcb719280]
The total_compute_allocation_percent field in the RoundSchedule struct is used to ensure that the total compute allocation across all canisters is accounted for when adjusting their execution priority and distributing any available free capacity in each round.

[15c174d21]
Checks the LogVisibilityV2 setting to determine if the caller has the necessary permissions to access the canister’s logs. If the caller is not authorized, the backtrace information is stripped from any HypervisorError before the result is returned, ensuring that detailed error information is only accessible to authorized entities.

[8596e9813]
Removal of redundant queues_mut() calls when handling input queues across several modules

shed_responses to track callbacks for expired or shed responses

Optimize the handling of canister input and output queues by eliminating redundant queues_mut() calls and introducing shed_responses to better manage callbacks for expired or shed responses

[1a1c213f3]
Increasing the cycle allocations for canister operations across several test cases, with the values being updated from 100 billion cycles to 121 billion cycles in most cases

MAX_INSTRUCTIONS_PER_INSTALL_CODE limit has been significantly raised from 200 billion to 300 billion to accommodate larger state sizes during upgrades.

[6cb46aac8]
ic-sns-root as a dependency across various components such as Cargo.toml, BUILD.bazel
Methods for querying version information and upgrade steps.
Methods in RootCanister to retrieve summaries and manage SNS canisters
Renaming of certain functions to better reflect their purpose, such as query_mainline_sns_upgrade_steps.

[735935aa2]
slot limit feature into the ConsensusManager
prevent exceeding this limit for slot entries associated with peers
new metrics to track occurrences when the slot limit is exceeded

[87ed92725]
Improve security and configuration handling within the system.

machine ID relabeling was updated with a systemd command for consistency

encryption setup now interacts with cryptsetup-pre.target, and several SELinux policy tweaks were introduced to enhance security

[47590772d]
Change GuestOS loader to a larger firmware file (OVMF_CODE_4M.fd)
Remove the migratable='off' option from the QEMU CPU configuration
Dockerfile was updated to fix a reproducibility issue by replacing /bin/dash with /bin/bash

cleanup of Python bytecode and enabling SSH

[09ddd7d5b]
Change GuestOS service type from “forking” to “oneshot” with RemainAfterExit=true
Remove the PIDFile setting
enable and disable functions for autostarting the GuestOS virtual machine have been removed from the startup and shutdown scripts

Bugfixes

[60f1d5562]
check to ensure that a canister’s balance is sufficient to cover the debit and removed cycles during cleanup operations

[ba5ffe01a]
is_first_iteration flag into multiple functions in the scheduler to distinguish the behavior of the first canister execution in a round from subsequent executions

[d2657773d]
Update INSTRUCTION_OVERHEAD_PER_CANISTER

[a9ebaa9e9]
Replace the RwLock pattern with OnceCell for managing CertificateDelegation across various modules.

Avoids repeated locking and unlocking of resources.

[77dc52029]
improve the handling of error conditions and argument validation across multiple scripts.
make the --metric argument optional unless a metrics.sh script is present

Chores

[e773cf5df]
Operate on HashedBlock instead of Block objects. Avoid recompute.

[c972dc928]
Remove imports
ic_logger::ReplicaLogger, ic_interfaces_registry::RegistryClient, ic_types::crypto::threshold_sig::ni_dkg::NiDkgDealing, ic_types::replica_config::ReplicaConfig, and the associated methods and functionality that involved logging, replica version lookups, and DKG payloads.

Rewrite PoolReader struct

[9fe63e2f7]
sign_bip340_with_aux_rand, the signing key is created directly from the secret key without the need to manually check if the y-coordinate of the public key is odd and then negate the scalar

[726cb686a]
Remove long_execution_mode reset and priority credit logic apply_priority_credit function is updated to handle resetting the long_execution_mode after long-running executions

[286f2cbbe]
Fix comments

[fa2329782]
Replace QueueItem structure in the FIFO queue with uint64 reference, maintaining backward compatibility by keeping the deprecated queue alongside the new one.

[f8f2d84f3]
Remove the InputOutputQueue and QueueEntry message types along with the associated fields like input_queues and output_queues from the CanisterQueues structure.

[6ed86361e]
The ic-btc-validation module was moved and reorganized under a dedicated directory, consolidating the validation logic for better modularity. This also involved adjusting dependencies to accommodate the new structure, but the core logic of Bitcoin validation remains largely unchanged.

[0161abba3]
renaming and relocating the ic-xnet-endpoint crate to ic-http-endpoints-xnet, consolidating it under the http_endpoints module.

[3bbabefb7]
renaming and restructuring key directories and files from the rosetta-api namespace to the ledger_suite namespace. This affects various components like the ledger, index, and archive canisters

[42f2bd3d4]
Remove several dependencies, including httptest, hyperlocal-next, and rustls-native-certs

Remove of the certificate-syncer service and associated files.

configurations related to nginx and certificate synchronization have been simplified, with the certificate-issuer now handling certificate retrieval and processing independently of the removed services

[e2cb3d638]
names from compile to compile_protos in various build.rs files across multiple services

upgrade prost and tonic crates

enum variant references across different modules are adjusted to use Self instead of repeating the type name directly for clarity and consistency

[9c08b9984]
SaturatingSub trait from num_traits for the AmountOf<Unit, Repr> type

[f7791372e]
Update prost, prost-build, and tonic dependencies to their latest versions (from 0.12.x to 0.13.2 for prost and from 0.11.0 to 0.12.2 for tonic)

[d66fdcb4c]
Update docker workflows from ghcr.io/dfinity/ic-build@sha256:d883e7f6e5a355d63f6d8a294cfbeb47161bc27055e34730c21c6eeae0617acb to a newer version ghcr.io/dfinity/ic-build@sha256:115daa5ad5149182bb0416cbe5730f305be3bb2f48df576bc2c23067eefce84b

improve error inspection and logging, such as using .inspect_err()

[c39a8b35b]
Remove list_state_heights method from the StateManager trait

moved and re-implemented in the StateManagerImpl with some refactoring for better internal handling of state snapshots

[a4e281d92]
Replacement of the XNetEndpointConfig with a Config structure

Removal of legacy test configurations and helper modules

[91d8f93ed]
Update dependencies across multiple packages, moving from hyper version 0.14.30 to hyper version 1.4.1 and replacing ic-xnet-hyper with axum and hyper-util.

Refactor for tokio::runtime::Handle and axum

[d9ae74c7d]
Removes ic_btc_validation crate dependency
Removes is_beyond_last_checkpoint

[6a2eca082]
Fix comment

[0279b0f4f]
clap from version 4.4.6 to 4.5.18 and anstream from 0.6.4 to 0.6.15.
Add is_terminal_polyfill

[a34cbd96a]
Remove ipv6_address configuration option, making ipv6_prefix the only required setting for configuring network settings

[90ad56b73]
Bazel version from 7.0.1 to 7.3.1 and updating the container image hashes used in various GitHub workflow files to reflect a new build image sha256:2c6fc0aa92ada647e42790cbdac3199b27a1407d9e90ff6e5a97a69acac24041

[10b880941]
Update base images

[3929437f7]
Update base images

Refactoring

[afad27aa2]
Rename get_all_validated method to get_all_for_broadcast across multiple artifact pool modules

implementations of the get_all_validated method have been removed when they were unused or deemed unnecessary.

[54c3542bc]
Moves ongoing_long_install_code from inner_round to drain_subnet_queues

[3221c5936]
Replace direct collections of iterators with cloned collections in queue processing

handling message queues were refined, renaming and updating the types of queues and pools to better manage memory and expired messages.

[41a9d9db7]
The MAC generation function removes redundant parameters
IPMI-based MAC retrieval was integrated as a fallback when a management MAC is not provided in the deployment settings

[37b9754a8]
CI_PULL_REQUEST_TARGET_BRANCH_SHA now MERGE_BASE_SHA

Tests

[5b4a6e3a5]
Extra guards around canister snapshot changes.

proposal - 133310

Vote: REJECT (Intermittent hash error) depends on proposal - 133309

Hash: Intermittent Error

Feedback: NONE

[1ff0e709f]
Reverts commit 87ed9272513168a85cdcf1bb52b232c2a1c7493c

1 Like

Proposal 133309

Hashes do not match.
REJECTED.


Proposal 13310

Hashes do not match.
REJECTED.

Would someone who produced a HostOS image that did not match the predicted SHA mind sharing them? This would be immensely helpful in deducing which files in HostOS are causing the determinism. Thanks in advance.

A clarification: the proposal in this thread only involves upgrading GuestOS, which seems to have reproduced in all reported instances. Nonetheless, we’re likely going to opt for the most cautious option.

I’d like to get that source of nondeterminism fixed ASAP.

2 Likes

@Zane @cyberowl @ZackDS @ilbert

Here is a file share to upload your HostOS images (please upload your HostOS update image update-img.tar.zst): NODE-1490: HostOS non-determinism - Google Drive

Note that you do NOT need to upload GuestOS or SetupOS images (SetupOS images contain a HostOS image, so we can assume the non-determinism is in HostOS)

Within the file share, there is a folder for the d2657773 (initial release) and 1ff0e709 (the subsequent release)

For those that had non-determinism issues, I’ve created folders with each of your account handles. Please place the file in your respective folders. This way, we can inspect the issue and identify the source of the nondeterminism.

And thank you for your work verifying the IC-OS releases! The whole community thanks you!

4 Likes

Hello @Zane @cyberowl @ZackDS @ilbert,

We appreciate your help verifying the builds. Node team is looking into what might be causing the reproducibility issue and getting the “bad” hostOS image from one of you would help us diagnose it easier.

I would like to point out that this particular proposal is only about upgrading the replicas (guestOS). You will notice in all your screenshots that guestOS was in fact built reproducibly and verified by you: The hashes for guestOS match and you can feel comfortable voting this proposal in.

5 Likes

Hey @dmanu, @andrewbattat, @raymondk,

It looks like you missed me out. :upside_down_face:

Note that I collapse a lot of details in my reviews under ...

… sections like this to cut down on the noise on this thread (I hope other will start doing this too).

My build hash verification screenshots were probably missed as a result of this.

Anyway, I’ve created a @Lorimer folder and uploaded my update-img.tar.zst for 1ff0e709f0d. I’m in the process of rebuilding d2657773d007 so that I can get that one to you too (assuming I can reproduce the non-reproducible build :wink:)

I would have gotten this to you sooner, but I have a full time job during the day, and I was otherwise tied up this evening.

BTW, could line 122 of this HostOS upgrade commit be significant?

# Fix reproducibility issue. Notes in hostos/context/Dockerfile

There was a similar reversion in last weeks GuestOS upgrade commit

# Clear additional files that may lead to indeterministic build.

Update: I’ve uploaded both images now. However I decided to compare them and noticed that they’re binary equivalent… Must have done something wrong. I retrieved the file from /artifacts/icos/hostos/update-img.tar.zst. Is that not the correct output folder?

4 Likes

Unfortunately I don’t have the old ones, so I built the d2657773 again, it failed the same and I uploaded it. Let me know if you need me to run the second as well.
Edit: actually I’ll build and if it fails will upload that one also.


And it’s done.

4 Likes

As a side note, I’m not so sure that last part should be considered accurate, given the discussions on last week’s proposal →

:upside_down_face: