Proposal to elect new release rc--2024-10-11_14-35

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 133396.

Here is a summary of the changes since the last release:

Release Notes for release-2024-10-11_14-35-base (6fb2fd1f0512c81578ea0d1984c48851e6c08b0d)

This release is based on changes since release-2024-10-03_01-30-base (d2657773d007e1b4c0b2dd715c628d24c0d7b5fb).

Please note that some commits may be excluded from this release if they’re not relevant, or not modifying the GuestOS image. Additionally, descriptions of some changes might have been slightly modified to fit the release notes format.

To see a full list of commits added since last release, compare the revisions on GitHub.

This release diverges from the latest release. Merge base is fa2329782561f1b4a5d27052147023e75727e1fa. Change that was removed from this release was a cherry-pick from the master branch (reintroduced as commit e9afa6f54).

Features:

  • d6957f09a Consensus,Interface: Enable pprof-based flamegraphs in ingress manager benchmarks (#1853)
  • 9df94b4f7 Execution,Interface: Enable storage reservation mechanism on verified application subnets (#1930)
  • 8105c7140 Execution,Interface,Message Routing: Trigger callback expiration in StateMachine (#1832)
  • b0cb8a12e Execution,Interface,Message Routing: Implement callback expiration (#1699)
  • d70b9eb6f Interface(ICP-Ledger): Add test icp allowance getter endpoint (#1934)
  • 05d54e257 Interface(cketh): Use EVM-RPC canister 2.0.0 (#1831)
  • d1db89ed7 Interface(ICRC-ledger): Implement V2 for ICRC ledger - use memory manager during upgrade (#1414)
  • c8d029531 Interface: Propagate execution mode (wasm64/32) to replica (#1784)
  • e17d99af7 Node: replace fetch-mgmt-mac.sh with hostos_tool command (#1883)

Bugfixes:

  • b53c6cfe6 Execution,Interface,Message Routing: Prevent duplicates of aborted or paused responses (#1851)
  • 4c17f87e8 Interface: cargo build (#1866)
  • 6fb2fd1f0 Interface,Message Routing,Networking: fix the regression in XNET (#1992)
  • aa2de1256 Node(k8s-testnets): allow ssh access to bn nodes in k8s (#1793)
  • 9f068bb16 Node: Fix handling of microcode for 24.04 (#1888)
  • fcad095e7 Node: verbose logging service file failure (#1858)
  • fdbd50e3e Node: Small change in how we enable systemd services (#1824)

Chores:

  • 41030a8ad Consensus,Interface(consensus): add metrics for how long it takes to compute a bouncer function (#1880)
  • 717c3a3a7 Consensus,Interface: Revert custom impl ExhaustiveSet for RejectCode (#1834)
  • 28ac05e1f Execution,Interface: Revert risky changes for load issues (#1936) (reverts d2657773d/e9afa6f54, ebe9a6230, fcb719280 from last weeks’ release, and b141ebe3c from this week’s release)
    • author: Andr Bere | b141ebe3c Execution,Interface: Consolidate scheduling logic (#1815)
    • author: Dimi Sarl | e9afa6f54 Execution,Interface,Networking: Tweak instruction overhead per canister (#1819)
  • 1e88b9dda Execution,Interface: clap 4 migration (#1871)
  • 43ae0b304 Execution,Interface: Upgrade Wasmtime to v.25 (#1847)
  • fcbc91f0a Interface: update ic-cdk to 0.16.0 (#1868)
  • 5b82b0e27 Interface,Networking: Bump hyper-util to 0.1.9 (#1781)
  • da3de2d4a Interface,Networking: enable sync v3 calls on all subnets except NNS subnet (#1938)
  • b9ae85afa Interface,Networking: Change v3 call feature gate to const bool flag (#1924)
  • aee21c80d Owners: upgrade rustls (#1912)
  • 3bc150483 Owners: Upgrade Wasmtime v25 dependencies (#1848)
  • 839976182 Owners: upgrade strum and remove redundant feature (#1795)
  • c12572f3a Node: Change how the build time is calculated (#1876)
  • c918618eb Node: assorted ic-os bash script clean-ups and tweaks (#1857)
  • eada4b26a Node(ic): Update python formatting rules for the monorepo (#1751)
  • 926a05687 Node: Update Base Image Refs [2024-10-03-1220] (#1823)
  • 4cece3a67 Node: Update Base Image Refs [2024-10-02-1854] (#1810)

Refactoring:

  • f7a7fd7c8 Execution,Interface,Message Routing: Refactor struct task queue to have separate field for paused aborted tasks (#1867)
  • 501d3aa82 Execution,Interface,Message Routing: Encapsulate the CallContextManager within SystemState (#1498)
  • 5127f0463 Execution,Interface,Message Routing: Refactor task_queue (#1490)
  • a7d5b717a Interface,Node: Config types refactor (#1667)
  • c65c725dd Node: remove dead code in generate-replica-config.sh (#1943)
  • d544428d8 Node: miscellaneous icos refactoring and clean-up (#1937)

Full list of changes (including the ones that are not relevant to GuestOS) can be found on GitHub.

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/6fb2fd1f0512c81578ea0d1984c48851e6c08b0d/ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c 6fb2fd1f0512c81578ea0d1984c48851e6c08b0d --guestos

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

While not required for this NNS proposal, as we are only electing a new GuestOS version here, you have the option to verify the build reproducibility of the HostOS by passing --hostos to the script above instead of --guestos, or the SetupOS by passing --setupos.

3 Likes

Hello there!

We are happy to announce that voting is now open for a new IC release.
The NNS proposal is here: IC NNS Proposal 133397.

Here is a summary of the changes since the last release:

Release Notes for release-2024-10-11_14-35-overload (aba60ffbc46acfc8990bf4d5685c1360bd7026b9)

This release is based on changes since release-2024-10-11_14-35-base (6fb2fd1f0512c81578ea0d1984c48851e6c08b0d).

Please note that some commits may be excluded from this release if they’re not relevant, or not modifying the GuestOS image. Additionally, descriptions of some changes might have been slightly modified to fit the release notes format.

To see a full list of commits added since last release, compare the revisions on GitHub.

This release diverges from the latest release. Merge base is 5b82b0e27c0af680aa593dd2edca9c0688f85985. Change that was removed from the base release was a cherry-pick from master (reintroduced as commit aba60ffbc).

Features:

  • 2b2d97de9 Execution,Interface: Charge idle canisters for full execution (#1806)
  • 340580ebd Execution,Interface: Charge canisters for full execution (#1782)
  • 6b78f2d91 Execution,Interface,Networking: Increase max sandbox count
  • 4ad4ba368 Execution,Interface,Networking: Increase per canister overhead

Bugfixes:

  • aba60ffbc Interface,Message Routing,Networking: fix the regression in XNET (#1992)

Other changes:

  • 430a75a0b Execution,Interface,Networking: Revert “feat: Increase per canister overhead”

IC-OS Verification

To build and verify the IC-OS disk image, run:

# From https://github.com/dfinity/ic#verifying-releases
sudo apt-get install -y curl && curl --proto '=https' --tlsv1.2 -sSLO https://raw.githubusercontent.com/dfinity/ic/aba60ffbc46acfc8990bf4d5685c1360bd7026b9/ci/tools/repro-check.sh && chmod +x repro-check.sh && ./repro-check.sh -c aba60ffbc46acfc8990bf4d5685c1360bd7026b9 --guestos

The two SHA256 sums printed above from a) the downloaded CDN image and b) the locally built image, must be identical, and must match the SHA256 from the payload of the NNS proposal.

While not required for this NNS proposal, as we are only electing a new GuestOS version here, you have the option to verify the build reproducibility of the HostOS by passing --hostos to the script above instead of --guestos, or the SetupOS by passing --setupos.

3 Likes

Hi @Lorimer @ZackDS @wpb @Zane @yuvika @zenithcode @ipsita @cyberowl and other CodeGov reviewers.
We kindly request your urgent review and approval of the following two proposals:

  1. 133396 Release: 2024-10-11_14-35-base: This release is intended to decrease XNet failure rate during stream pulls and help reduce the growing number of open file descriptors on IC nodes.

  2. 133397 Release: 2024-10-11_14-35-overload: This release introduces performance improvements for managing IC subnets, particularly in response to increased load (as outlined here).

Given the importance of these updates (Status Page), we would like to adopt these releases as soon as possible to verify the effectiveness of the XNet improvements proposed, and to implement any necessary further improvements.
We kindly ask you to verify and approve these releases at your earliest convenience.

7 Likes

Proposal 133396 - release-2024-10-11_14-35-base. Voted to adopt.

This took longer then usually due to a limited download speed for the OS images “pushed by CI” .
Builds fine and all hashes match this time. All the commits match as well.

Features:
d6957f09a Consensus,Interface:
Defines a criterion group named benches for benchmarking the build_payload and validate_payload functions.
The config parameter enables profiling with the PProfProfiler, setting a maximum profiling time of 499 seconds and generating flamegraph output files in the bazel output directory.
Does the same for handle_ingress as well.
9df94b4f7 Execution,Interface:
The verified application subnets also get the storage reservation mechanism enabled.
The two modified tests are canister_request_take_canister_cycles_reserved_for_app_and_verified_app_subnets test verifies that when a canister request takes a canister snapshot, the reserved cycles are correctly updated, considering the subnet type and memory usage.
And the wasm_memory_grow_reserves_cycles test verifies that growing the stable memory of a canister using the stable64_grow instruction reserves cycles.
While cycles_reserved_for_app_and_verified_app_subnets function is a helper function designed to test a given function test with both SubnetType::Application and SubnetType::VerifiedApplication subnet types.
8105c7140 Execution,Interface,Message Routing:
Adds a counter metric named METRIC_TIMED_OUT_CALLBACKS_TOTAL to track the total number of expired best-effort callbacks to message_routing.
The execute_round function is responsible for executing a round within the state machine, processing batches of messages and updating the replicated state. It is defined as a method of the StateMachineImpl struct.
The added code to it processes timed-out callbacks:
Identifies timed-out callbacks using state.time_out_callbacks().
Increments the metric timed_out_callbacks_total to track the total count.
Handles errors encountered during the timeout process.
Other addition is the has_expired_callbacks function that checks whether there are any unexpired best-effort callbacks whose deadlines have passed.
Lastly the time_out_callbacks function processes expired callbacks within the state machine. It identifies canisters with expired callbacks, processes those callbacks, and returns the count of expired callbacks and any errors encountered.(Returns a tuple containing the expired_callback_count and the errors vector)
b0cb8a12e Execution,Interface,Message Routing:
Besides the abundance of tests, the added time_out_callbacks function processes expired best-effort callbacks within a canister’s state. It identifies expired callbacks, checks for conflicts with aborted or paused callbacks, and enqueues deadline-expired input messages.
While the call_context_manager_mut function effectively retrieves a mutable reference to the CallContextManager based on the canister status.
d70b9eb6f Interface(ICP-Ledger):
Added test_icp_allowance_getter_unavailable_in_prod test verifies that the allowance method is not available in production environments.
And test_get_icp_approval test verifies the functionality of the allowance method in an ICP token canister. It ensures that a user can approve another user to spend a specific amount of their tokens.
05d54e257 Interface(cketh):
Read about changes Release v2.0.0 · internet-computer-protocol/evm-rpc-canister · GitHub
d1db89ed7 Interface(ICRC-ledger):
Matches description.
c8d029531 Interface:
Added the is_wasm64 flag.
The check_correct_execution_state function tests whether the is_wasm64 property of the execution state is set correctly based on the WASM module type.
The wasm64_correct_execution_state and wasm32_correct_execution_state tests call the check_correct_execution_state function with true and false respectively, to test both WASM64 and WASM32 scenarios.
e17d99af7 Node:
The changed code handles the Commands::FetchMacAddress command, which retrieves the management MAC address of the device.(uses IPMI to retrieve the management MAC address)

Bugfixes:
b53c6cfe6 Execution,Interface,Message Routing:
The added code prevents enqueuing a second response for a callback that is already executing (aborted or paused).
Checks if the message (msg) is a response.
If the message is a response, retrieves the aborted or paused response from the current state.
If an aborted or paused response exists, compares the originator_reply_callback fields of the current response and the aborted or paused response.
If the callback IDs match, it means the callback is already executing, so the code returns without enqueuing a second response.
The should_enqueue_input function determines whether a response should be enqueued for processing based on its details and the current state of the call context manager.
4c17f87e8 Interface: & 6fb2fd1f0 Interface,Message Routing,Networking: and aa2de1256 Node(k8s-testnets): all match their description as do fcad095e7 Node: & fdbd50e3e Node: also.
9f068bb16 Node:
Will follow up with this one to see if it fixes the issues with build determinism on my intel machine.

Chores:
41030a8ad Consensus,Interface(consensus):
The BouncerMetrics struct is designed to track metrics related to the bouncer component specifically, it tracks the duration of the update operation.
717c3a3a7 Consensus,Interface: & 28ac05e1f Execution,Interface: both reverts match description and make sense specially for a “clean RC”.
Worth mentioning the addition of the apply_priority_credit function that applies the accumulated priority credit to the scheduler state and resets the long execution mode.
Subtracts the accumulated priority credit from the scheduler_state.accumulated_priority field using std::mem::take. This effectively transfers ownership of the accumulated credit to the subtraction operation.
Then sets the scheduler_state.long_execution_mode to its default value. This indicates that the canister is no longer in long execution mode.

The rest of following commits are updates/upgrades/bumps and all match their descriptions.
1e88b9dda Execution,Interface:
43ae0b304 Execution,Interface:
fcbc91f0a Interface:
5b82b0e27 Interface,Networking:
aee21c80d Owners:
3bc150483 Owners:
839976182 Owners:
eada4b26a Node(ic):
926a05687 Node:
4cece3a67 Node:

da3de2d4a Interface,Networking: This reverts b9ae85afa Interface,Networking.
c12572f3a Node: Very good explanation.
c918618eb Node: Exactly what it describes

All the refactoring also matches description and LGTM.

Proposal 133397 - release-2024-10-11_14-35-overload. Voted to adopt.

Builds fine Guest OS is a match, the rest as can be seen.

Features:
2b2d97de9 Execution,Interface:
The charge_idle_canisters function marks idle canisters in the schedule as fully executed. It iterates through the ordered list of new execution canisters, checks their next execution status, and finishes those that are idle.
Looking at the charge_idle_canisters_for_full_execution_round test we can see it simulates a full execution round with idle and busy canisters to verify the correct application of priority credits and the marking of idle canisters as fully executed.
Breaking this down is as follows :
Creates a SchedulerTestBuilder instance with the specified configuration, including scheduler cores, maximum instructions, and other parameters.
Executes an initial round to initialize the scheduler.
Creates idle canisters by simply creating them without sending any ingress messages.
Creates busy canisters by sending multiple ingress messages to keep them active.
Iterates through a specified number of rounds.
Executes each round using test.execute_round(ExecutionRoundType::OrdinaryRound).
Calculates the total accumulated priority and total priority credit for all canisters.
Asserts that idle canisters have been correctly marked as fully executed and that the accumulated priority and priority credit are balanced.
Probably a deepdive into priority credits would make it easier to understand the whole picture but it’s past the point at this time.

340580ebd Execution,Interface:
The finish_canister_execution function marks a canister as fully executed for a round, updates its scheduler state, and adds it to the list of fully executed canisters.
The finish_round function processes the results of a completed round, charging canisters for full executions and updating their scheduler states.
Calculates a multiplier based on the number of scheduler cores and canisters.
Iterates over the fully_executed_canister_ids.
For each canister:
Updates the total_charged_priority by adding 100 * multiplier.
Increases the priority_credit in the canister’s scheduler_state by 100 * multiplier.
Calculates the total free capacity by subtracting the total allocated capacity from the total charged priority.
Calculates the free capacity per canister by dividing the total free capacity by the number of canisters.
Iterates over all canisters.
Calculates the effective compute allocation for each canister by adding the free capacity per canister to the existing compute allocation.
Increases the accumulated_priority in the canister’s scheduler_state by the effective compute allocation.
Here optimizing the iterations and calculations, especially for large numbers of canisters could be considered a Potential Improvement.

6b78f2d91 Execution,Interface,Networking:
Doubles the DEFAULT_MAX_SANDBOX_COUNT.

4ad4ba368 Execution,Interface,Networking:
This one is a bit more interesting since this doubles INSTRUCTION_OVERHEAD_PER_CANISTER that represents the estimated instruction overhead per canister, and it gets reverted back by Other changes: 430a75a0b Execution,Interface,Networking.

2 Likes

Thanks for these releases DFINITY

TLDR: I’m voting to adopt both proposals


133396

Build successful and hashes generated on my machine match (CDN and local build), and the GuestOS hash matches the proposal payload.

:partying_face: :tada: no non-deterministic builds

There are 107 commits since the previous release, 39 of which are referenced in this proposal. There are 24 files that have been modified both by commits referenced in this proposal as well as commits that weren’t. I would have expected to see de4876fe2 in the proposal summary change log (as a HostOS change) given the talk from a few weeks ago.

I initially thought another commit may be missing until I released it was organised under a reversion (nice! thanks for sorting this @Luka) :point_down:


Regarding the divergence and cherry pick (so no divergence after all) - I don’t see why merging wasn’t preferred over cherry picking here. I think that would have been a lot clearer (particularly given that the git graph would then illustrate exactly what’s happening). I’ve mentioned this sort of thing in more detail in previous releases (e.g. rc–2024-07-25_21-03, rc–2024-08-29_01-30).


All commits appear to match their commit messages well and seem reasonable - including callback expiration, storage reservation for verified application subnets, addressing a regression in the XNET, introduction of profiling tools, new metrics, dependency updates, refactoring, and some reversions.

I’ve also reviewed the unelection component of this proposal below.

There currently appear to be 9 blessed replica versions registered, 6 of which would be unelected by this proposal. These unelected versions are not running on any subnets, nor any unassigned nodes, so appears safe to unelect. Expand for details.
  • 0441f40, elected 2024-09-23 (proposal 133061), UNELECTION PROPOSED, running on 0 subnets
  • 7f6a81f, elected 2024-09-23 (proposal 133062), UNELECTION PROPOSED, running on 0 subnets
  • c87abf7, elected 2024-09-23 (proposal 133063), UNELECTION PROPOSED, running on 0 subnets
  • 35153c7, elected 2024-09-30 (proposal 133142), UNELECTION PROPOSED, running on 0 subnets
  • d101161, elected 2024-09-30 (proposal 133143), UNELECTION PROPOSED, running on 0 subnets
  • c43a488, elected 2024-09-30 (proposal 133144), UNELECTION PROPOSED, running on 0 subnets
  • d265777, elected 2024-10-08 (proposal 133309), running on 0 subnets
  • 1ff0e70, elected 2024-10-08 (proposal 133310), running on 0 subnets
  • f0c923e, elected 2024-10-09 (proposal 133327), running on 37 subnets and all unassigned nodes (since proposal 133372)

133397

Build successful and GuestOS + HostOS hashes generated on my machine match (CDN and local build), and the GuestOS hash matches the proposal payload. SetupOS is still having reproducibility issues, but it's not the target of the propopsal

:point_up: This is not strictly the case. The release is actually based on changes since 5b82b0e2. 6fb2fd1 is instead cherry picked. The end result is the same though (this release branch has additional changes not present in the other one).

All commits appear to match their commit messages well and seem reasonable. I’m glad to see monitoring enhancements along with these changes. I’m assuming the impact of increasing the sandbox count and the task spawning changes will be closely monitored.

I hope these changes play well in production.

2 Likes

I haven’t had time to do a full review. I’ve only managed to review the “Features” group of commits, but all of them have looked fine to me. I have also successfully run the build verification scripts for both proposals, so I have voted to adopt both.

Features:

d6957f09a Consensus,Interface: Enable pprof-based flamegraphs in ingress manager benchmarks (#1853)
Review: Looks fine + matches description
Notes: Allows devs to capture flamegraphs for the ingress manager to dig into where CPU time is spent.

9df94b4f7 Execution,Interface: Enable storage reservation mechanism on verified application subnets (#1930)
Review: Looks fine + matches description
Notes: Makes the max_storage_reservation_period setting the same for Application and Verified Application subnets which now means that both of them now use the same CyclesAccountManagerConfig settings.

8105c7140 Execution,Interface,Message Routing: Trigger callback expiration in StateMachine (#1832)
Review: Looks fine + matches description
Notes: Calls state.time_out_callbacks(); near the start of each execution round to process the response callbacks which have just expired.

b0cb8a12e Execution,Interface,Message Routing: Implement callback expiration (#1699)
Review: Looks fine + matches description
Notes: Implements SystemState::time_out_callbacks which gets the expired callbacks and pushes a deadline_expired input for each one to notify the target canisters that the responses have timed out.

d70b9eb6f Interface(ICP-Ledger): Add test icp allowance getter endpoint (#1934)
Review: Looks fine + matches description
Notes: Adds the icp_allowance endpoint to the ICP ledger which is only used in test mode and works with AccountIdentifiers as opposed to icrc2_allowance which works with owner + subaccount pairs.

05d54e257 Interface(cketh): Use EVM-RPC canister 2.0.0 (#1831)
Review: Looks fine + matches description
Notes: Bumps the EVM RPC canister + the evm_rpc_types dependency then updates the various clients to work with the new types.

d1db89ed7 Interface(ICRC-ledger): Implement V2 for ICRC ledger - use memory manager during upgrade (#1414)
Review: Looks fine + matches description
Notes: Removes the old version of pre_upgrade from the ICRC ledger now that the migration is complete to the version that uses MemoryManager rather than simply using the default stable memory writer.

c8d029531 Interface: Propagate execution mode (wasm64/32) to replica (#1784)
Review: Looks fine + matches description
Notes: Add is_wasm64 boolean to the ExecutionState so that in subsequent PRs the cycles charges can be adjusted accordingly.

e17d99af7 Node: replace fetch-mgmt-mac.sh with hostos_tool command (#1883)
Review: Looks fine + matches description
Notes: Replaces the fetch-mgmt-mac.sh with a new command within the hostos_tool.

VOTE: ADOPT

Hash: MATCH

Features:

[d6957f09a]
criterion crate for benchmarking, alongside the pprof profiler

flamegraph generation for performance profiling in specific benchmark tests

[9df94b4f7]
cycles are reserved and managed during tests for memory allocation, canister upgrades, and other operations on both application and verified subnets.

tests properly simulate real-world scenarios by consolidating cycle management logic across different subnet types

[8105c7140]
new metric for tracking timed-out callbacks in addition to timed-out messages

improving monitoring and error handling for message routing.
handles expired callbacks more robustly by queuing “deadline expired” responses

[b0cb8a12e]
track and handle expired callbacks in message queues, ensuring “deadline expired” responses are correctly enqueued when callbacks expire.

[d70b9eb6f]
“icp-allowance-getter,” query ICP allowance for a specified account and spender
Bazel build configuration has been updated to use a new rust_ledger_canister wrapper

[05d54e257]
candid crate from 0.10.6 to 0.10.10 and updating the thiserror crate from 1.0.62 to 1.0.64 across various dependencies

new evm_rpc_types crate

[d1db89ed7]
remove support for the “next-migration-version-memory-manager” feature from the ICRC1 ledger

[c8d029531]
new is_wasm64 flag in the ExecutionState and related components

make the replica aware of whether a canister is executing in Wasm32 or Wasm64 mode

[e17d99af7]
replace the old method of fetching the MAC address using the fetch-mgmt-mac.sh script with a new command in the hostos_tool called fetch-mac-address

Bugfixes:

[b53c6cfe6]
Func moved from call_context_manager to system_state for should_enqueue_input determines whether an incoming response should be enqueued based on its callback ID, matching the originator, respondent, and deadline against the expected values, returning an error if they don’t match or if the callback is unknown.

[4c17f87e8]
Add strum_macros and flag --locked for cargo build.

[6fb2fd1f0]
two new metrics (connections_total and closed_connections_total) in the XNetEndpointMetrics struct, which track the total number of accepted and closed XNet TCP connections

[aa2de1256]
Not sure why this deviated from the rest.

[9f068bb16]
new microcode configuration files for both AMD and Intel architectures

[fcad095e7]
Limit logging

[fdbd50e3e]
Dockerfiles now exclude certain service files (those ending with @.service) from being automatically enabled by systemctl

Chores:

[41030a8ad]
BouncerMetrics struct to track the duration of the bouncer functions across various modules, including consensus, certification, DKG, and IDKG

[717c3a3a7]
Fix RejectCode for ExhaustiveSet

[28ac05e1f]
Reverts changes
Main change being around SchedulerImpl mainly focus on refining the scheduling process and metrics tracking

SchedulerImpl::compute_capacity_percent was added to dynamically calculate available execution capacity based on the number of scheduler cores, ensuring optimal resource allocation for both short and long-running executions.

[1e88b9dda]
Update version of the clap crate
simplifying the argument handling with clap’s newer APIs like get_one, num_args, and contains_id instead of the older value_of and is_present

[43ae0b304]
Upgrade cranelift dependencies from version 0.111.0 to 0.112.1 and corresponding updates to wasmtime components from 24.0.0 to 25.0.1

[fcbc91f0a]
ic-cdk and ic-cdk-macros from version 0.13.5 to 0.16.0, alongside related dependencies

[5b82b0e27]
bytes from 1.7.1 to 1.7.2 and hyper-util from 0.1.7 to 0.1.9

tower-layer dependency has been removed

[da3de2d4a]
Reverts, check if a subnet is whitelisted for serving synchronous responses to v3 update calls by using a list (SUBNETS_WITH_DISABLED_SYNCHRONOUS_CALL_V3) of subnets that are excluded, along with the function enable_synchronous_call_handler_for_v3_endpoint

[b9ae85afa]
Set ENABLE_SYNCHRONOUS_CALL_V3 to false

[aee21c80d]
Update rustls version

[3bc150483]
Update wasm dependancies

[839976182]
Update strum

[c12572f3a]
release_build flag to distinguish between release and non-release builds

checking build-time to checking commit-time to determine if an installation image is more than six weeks old

[c918618eb]
Remove unnecessary calls to source the metrics.sh script across various IC OS components

[eada4b26a]
formatting and style improvements
adhere more strictly to Python’s typing and linting conventions

[926a05687]
Update base image

[4cece3a67]
Update base image

Refactoring:

[f7a7fd7c8]
Simplify the handling of paused and aborted tasks within the TaskQueue by replacing a set of paused tasks with a single paused or aborted task

[501d3aa82]
begin_stopping and start_canister to manage transitions between Running, Stopping, and Stopped states

[5127f0463]
New func abort_paused_executions_and_return_tasks used in abort_canister that consolidates code. Refactors check_dts_invariants

TaskQueue structure that consolidates and simplifies task management for canisters, including handling paused, aborted, and system tasks like Heartbeat and GlobalTimer.

[a7d5b717a]
types for IPv4 and IPv6 settings, such as Ipv4Config, DeterministicIpv6Config, and FixedIpv6Config

more explicit types

[c65c725dd]
Remove from script check if the hostname is empty, and if so, it derives a new hostname from the IPv6 address

[d544428d8]
Remove the use of the hostname variable and instead deriving the hostname from the MAC address

Replacing ipv6_subnet with ipv6_prefix_length

VOTE: ADOPT

Hash: MATCH

Features:

[2b2d97de9]
New func charge_idle_canisters iterates through a list of canisters and charges idle canisters by finalizing their execution if no further execution steps are pending

[340580ebd]
track fully executed canisters during each round, allowing for more accurate computation of priority credits and allocation of compute resources

[6b78f2d91]
Increase DEFAULT_MAX_SANDBOX_COUNT

[4ad4ba368]
Increase INSTRUCTION_OVERHEAD_PER_CANISTER

Bugfixes:

[aba60ffbc]
Similar to 6fb2fd1f0

Other changes:

[430a75a0b]
Decrease INSTRUCTION_OVERHEAD_PER_CANISTER

Proposal - 133396

Summary:

  1. Vote to adopt YES
  2. Build has matches

Features:

D6957f09a

Verified that pprof + flamegraph profiler has been added to generate flamegraphs.

9df94b4f7

Verified the change to increase the max_storage_reservation_period from 0 to 300 for verified subnets.

8105c7140

Verified that the trigger callback expiration is added to statemachine. Changes can be verified in state_machine.rs

B0cb8a12e

Verified that the changes has been made to message_pool for callback expiration.

D70b9eb6f

An endpoint is added to test icp allowance getter in the icp ledger code.

05d54e257

Verified EVM-RPC canister 2.0.0 is now being used in ckETH. Various part of the code has been refactored to use the evm_rpc_client instead of candid.

D1db89ed7

Verified that memory manager is being used now during upgrades. Certain code around pre_upgrade in icrc1 ledger canister has been removed.

C8d029531

Verified certain changes has been added to support wasm64 with a flag.

E17d99af7

Minor changes in hostos to replace shell script with hostos_tool to fetch mac related details.

Bugfixes:

B53c6cfe6

Verified the bug fix to prevent duplicates of aborted or paused responses. Fn should_enqueue_input was added to address it.

4c17f87e8

Minor fixes to cargo build

6fb2fd1f0

A bit of more changes has been made to expose additional metrics like METIC_CONNECTIOn and closed connections.

Aa2de1256

MIncor change to allow boundary nodes by exposing fda6:8d22:43e1::/48

9f068bb16

Minor code fixes for microcode for OS 24.04

Fcad095e7

The change has been made to logging service to only restart on failures which was not happening because of verbose flag.

Fdbd50e3e

Small change has been made to docker file on boundary os, guest os, hostos in how systemd services are now enabled.

Chores:

41030a8ad

A new BouncerMetrics has been added calculate compute on a counter function.

717c3a3a7

Custom implementation of rejectcode is reverted.

28ac05e1f

A couple of commits have been reverted for load issues.

1e88b9dda

Clap was migrated to 4.0.0

43ae0b304

Verified wasmtiem upgrade to v.25

Fcbc91f0a

Ic-cdk has been updated to 0.16, and certain breaking changes were also addressed.

5b82b0e27

Verified that the hyper-util is upgraded to 0.1.9

Da3de2d4a

Verified the change which enables synchronous v3 calls to all the subnets except the nns subnet.
Fn enable_synchronous_call_handler_for_v3_endpoint implements this.

B9ae85afa

Verified that the v3 call feature gate was changed from a function call to a bool constant.

Aee21c80d

Verified that the rustls has been bumped up to 0.23.14

3bc150483

Wasmtime v25 dependencies has been upgraded. Packages like wasmparser, and wasmprinter bumped up.

839976182

There is a minor version upgrade to strum to 0.26.4

C12572f3a

Little cleanup around bash scripts in ic-os

Eada4b26a

Verified that this commit has a few changes to python formatting rules for mono repo

926a05687

Verified base image reference has been updated accordingly.

4cece3a67

Same as above, verified the base image references update.

Refactoring:

Various code places has been refactored via commits f7a7fd7c8, 501d3aa82, 5127f0463,

A7d5b717a, c65c725dd, d544428d8

Proposal - 133397

Summary

  1. Vote to adopt yes.
  2. Build hash matches

Features:

2b2d97de9

Changes done to charge idle canisters for full execution. Revent changes also made to scheduler.

340580ebd

More changes has been made to charge for each fully executed canisters.

6b78f2d91

MAX_SANDBOX_COUNT has been increased from 1_000 to 2_000

4ad4ba368

Per canister overhead has again been increase from 4ms to 8ms and instructions from 8 to 16M.

Bugfixes:

Aba60ffbc

Minor bug fix in XNET regression

Other changes:

430a75a0b

Commit 4ad4ba368 change is again reverted here.

Proposal 133396

Hashes Match

[d6957f09a]
Adds proff + flamegraph profiler to correction config. functionality to generate flamegraphs by passing --profile-time to benchmark

[9df94b4f7]
enables storage reservation mechanism on verified application subnet. this reduces the gap of configurations between application subnet and verified application subnet.

[8105c7140]
triggers the expiration of best effort call back at the top of every DSM round.

[b0cb8a12e]
generate reject response on best-effort callback expiration. adds them into SYS_UNKNOWN when making peek/pop operations.

[d70b9eb6f]
Adds test icp allowance endpoint to fetch an allowance given AccountIdentifiers. it is only available in test wasm.

[05d54e257]
updates the version of EVM RPC client used by the ckETH and ckERC20

[d1db89ed7]
changes match the description, use memory manager during upgrade.

[c8d029531]
Adds is_wasm64 flag to the canister execution state. It is used to track the cycle consumption and is calculated the first time a canister is installed or upgraded.

[e17d99af7]
replaces fetch-mgmt-mac.sh with hostos_tool command helps to reuse code and config refactor.

Bugfixes

[b53c6cfe6]
explicitly check the callback ids of paused or aborted responses in SystemState::push_input() and SystemState::induct_messages_to_self() to avoid duplicates.

[4c17f87e8]
changes match the description, updates github automation.

[6fb2fd1f0]
spin up tasks and detach them as soon as a new connection arrives not only after the TLS handshake.

[aa2de1256]
allows ssh access to nodes in k8s

[9f068bb16]
changes match the description, intel microcode initramfs changed to switch to 24.04

[fcad095e7]
prevents restart loop for verbose-logging.sh.

[fdbd50e3e]
changes in systemd services on how it is enabled.

Chores

[41030a8ad]
adds metric to bouncer function.

[717c3a3a7]
reverts custom impl ExhaustiveSet for RejectCode as it is no longer needed since SysUnknown is supported by mainnet replicas.

[28ac05e1f]
changes match the description, this pr reverts all the changes which are considered risky.

[1e88b9dda]
clap 4 migration as per instructed here clap/CHANGELOG.md at master · clap-rs/clap · GitHub

[43ae0b304]
upgrades Wasmtime to v.25

[fcbc91f0a]
updates ic-cdk from 0.13.5 to 0.16.0

[5b82b0e27]
updated check sum, versions and other parameters in github automation.

[da3de2d4a]
reverts #1924.
enables sync endpoint on all subnets expect nns subnet

[b9ae85afa]
updates v3 call feature to boolean.

[aee21c80d]
upgrade rustls
A. upgrade rustls version from 0.23.12 to 0.23.14
B. hyper-rustls version from 0.27.2 to 0.27.3
C. upgrade rustls-webpki from 0.102.6 to 0.102.8.
and some more upgrades.

[3bc150483]
upgrade wasmtime versions

[839976182]
updates strum versions, checksum, url and sha256

[c12572f3a]
changes build time calculation from current time to commit timestamp

[c918618eb]
changes match the description, cleans up bash script in ic-os

[eada4b26a]
updates in python linting and formatting.

[926a05687]
updates base container image references.

[4cece3a67]
updates base container image references.

Refactoring

[f7a7fd7c8]
code refactoring for simplifying task queues logic.

[501d3aa82]
Code refactoring: Encapsulates CallContextManager within SystemState

[5127f0463]
refactors task queues

[a7d5b717a]
refactoring config types

[c65c725dd]
removes dead code

[d544428d8]
code refactoring and clean up for ic-os

voting to adopt

Proposal 133397

Features

[2b2d97de9]
Code change match the description, charge idle canister for full execution.

[340580ebd]
This change applied charge for fully executed canisters, the charge is evenly distributed among all the canisters.

[6b78f2d91]
DEFAULT_MAX_SANDBOX_COUNT increased from 1_000 to 2_000

[4ad4ba368]
increase INSTRUCTION_OVERHEAD_PER_CANISTER from 8M to 16M

Bugfixes

[aba60ffbc]
changes match the description. spin up tasks and detach them as soon as a new connection arrives.

Other changes

[430a75a0b]

Reverts commit 4ad4ba3, reduces INSTRUCTION_OVERHEAD_PER_CANISTER to 8M.

Hashes Match
voting to adopt

Proposal 133396

Hashes Match

Features

  • [d6957f09a]: Added a profiler for proff and flamegraph in the correction configuration, allowing flamegraph generation with the --profile-time flag during benchmarking.
  • [9df94b4f7]: Enabled a storage reservation mechanism on verified application subnets, narrowing the configuration differences between application and verified subnets.
  • [8105c7140]: Set best-effort callbacks to expire at the start of every DSM round.
  • [b0cb8a12e]: Generates a reject response when best-effort callbacks expire, marking them as SYS_UNKNOWN during peek/pop operations.
  • [d70b9eb6f]: Added an endpoint for testing ICP allowance, allowing the retrieval of an allowance using AccountIdentifiers. This is available only in the test WASM.
  • [05d54e257]: Updated the EVM RPC client version for ckETH and ckERC20.
  • [d1db89ed7]: Updated to use the memory manager during upgrades.
  • [c8d029531]: Introduced an is_wasm64 flag to the canister execution state, used to track cycle consumption upon canister installation or upgrade.
  • [e17d99af7]: Replaced fetch-mgmt-mac.sh with a new hostos_tool command, improving code reuse and configuration.

Bug Fixes

  • [b53c6cfe6]: Added explicit checks for callback IDs in paused or aborted responses within SystemState::push_input() and SystemState::induct_messages_to_self() to prevent duplicates.
  • [4c17f87e8]: Updated GitHub automation scripts.
  • [6fb2fd1f0]: Tasks now spin up and detach when a new connection arrives, instead of waiting until after the TLS handshake.
  • [aa2de1256]: Enabled SSH access for nodes in Kubernetes.
  • [9f068bb16]: Switched Intel microcode initramfs for Ubuntu 24.04 compatibility.
  • [fcad095e7]: Fixed an issue that prevented restart loops in the verbose-logging.sh script.
  • [fdbd50e3e]: Modified the way systemd services are enabled.

Chores

  • [41030a8ad]: Added a metric to the bouncer function.
  • [717c3a3a7]: Reverted custom ExhaustiveSet implementation for RejectCode since SysUnknown is now supported on mainnet replicas.
  • [28ac05e1f]: Rolled back risky changes from a previous pull request.
  • [1e88b9dda]: Migrated clap to version 4, following the changelog instructions.
  • [43ae0b304]: Upgraded Wasmtime to version 25.
  • [fcbc91f0a]: Updated ic-cdk from 0.13.5 to 0.16.0.
  • [5b82b0e27]: Updated checksums, versions, and other parameters in GitHub automation.
  • [da3de2d4a]: Reverted PR #1924 and re-enabled the sync endpoint on all subnets except NNS.
  • [b9ae85afa]: Updated v3 call feature to a boolean format.
  • [aee21c80d]: Upgraded rustls (from 0.23.12 to 0.23.14) and other associated libraries.
  • [3bc150483]: Updated Wasmtime versions.
  • [839976182]: Updated strum library versions and corresponding checksum and URL.
  • [c12572f3a]: Adjusted build time calculation to use commit timestamp.
  • [c918618eb]: Cleaned up bash scripts in ic-os.
  • [eada4b26a]: Updated Python linting and formatting.
  • [926a05687], [4cece3a67]: Updated base container image references.

Refactoring

  • [f7a7fd7c8]: Refactored task queue logic for simplification.
  • [501d3aa82]: Encapsulated CallContextManager within SystemState.
  • [5127f0463]: Refined task queue structure.
  • [a7d5b717a]: Refactored configuration types.
  • [c65c725dd]: Removed unused code.
  • [d544428d8]: Cleaned up and refactored code in ic-os.

Proposal 133397

Hashes Match

voting to adopt

Features

  • [2b2d97de9]: Introduced a feature to charge idle canisters for full execution.
  • [340580ebd]: Applied charges for fully executed canisters, distributing the cost evenly among them.
  • [6b78f2d91]: Increased DEFAULT_MAX_SANDBOX_COUNT from 1,000 to 2,000.
  • [4ad4ba368]: Raised INSTRUCTION_OVERHEAD_PER_CANISTER from 8M to 16M.

Bug Fixes

  • [aba60ffbc]: Fixed a bug where tasks now spin up and detach upon new connections instead of after the TLS handshake.

Other Changes

  • [430a75a0b]: Reverted a prior change, reducing INSTRUCTION_OVERHEAD_PER_CANISTER back to 8M.

Proposal 133396

Build is successful and verified commits check out. Adopted.

[9df94b4f7] Enabled by default storage reservation mechanism on verified application subnets using same value used on app subnets.

[8105c7140] Added time_out_callbacks method to replicate state, it iters through all canister and filters those with expired callbacks and then it executes the system_state version of time_out_callbacks on all those canisters, keepig track of the total sum of expired callbacks and eventual errors that occurred while trying to timeout the callbacks. The method is called by the StateMachine in execute_round.
A new metric has also been added to keep track of the count of expired callbacks.

[b0cb8a12e] Added expired_callbacks field to CanisterQueues and MessageStoreImpl structs alongside protobuf definition for it, this is a map containing callback ids of stale best effort callbacks which haven’t been enqueued. A method named time_out_callbacks has been added to SystemState, it is used to trigger callback expiration by calling the call context manager, then for each expired callback, if it hasn’t already started execution or produced a response, it will be added to the expired_callbacks map. Eventually this will be popped by the canister queues message store and produce a DeadlineExpired response.

[b53c6cfe6] Added check to should_enqueue_input and induct_messages_to_self to avoid enqueuing for the second time a response for an already executing callback. This is done by checking whether the response’s originator_reply_callback matches the one of the aborted or paused response at the head of the task queue, if it exists in the first place. Left comment (here)[fix: [MR-523] Prevent duplicates of aborted or paused responses (#1851) · dfinity/ic@b53c6cf · GitHub]

[05d54e257] Bump evm_rpc to 2.0 and add evm_rpc_types dependency to ckETH and ckERC20 minter.

[c8d029531] Added is_wasm64 flag to ExecutionState and ExecutionStateBits structs, in a followup PR it will be used in the execution layer to determine cycle pricing.

[28ac05e1f] Reverted a number of changed related to scheduling/load balancing. INSTRUCTION_OVERHEAD_PER_CANISTER has been increased back to its old value, idle canister in front of the round schedule are no longer marked as fully executed and the updated priority credit logic has been reverted.

[1e88b9dda] Update clap crate to version 4 migration.

[43ae0b304] Bump Wasmtime crate to v.25 and disable wasm_extended_const setting.

[fcbc91f0a] Bump ic-cdk to 0.16.0 and addressed breaking changes.

[f7a7fd7c8] Simplified handling of paused/aborted tasks in the TaskQueue by leveraging the assumption at any given time there is at most one of them.

[501d3aa82] status field of SystemState hass been made private and no longer expose directly the call context manager, instead all methods from it previously used outside of SystemState have been exposed using wrappers functions.

[5127f0463] Define and implement a new type for TaskQueues in SystemState and move on_low_wasm_memory_hook_status to it.

Proposal 133397

Build is successful and verified commits check out. Adopted.

[2b2d97de9], [340580ebd] Re enabled improvements to canister scheduling from last week.

[6b78f2d91] Increased DEFAULT_MAX_SANDBOX_COUNT const from 1k to 2k.

[4ad4ba368] Increased INSTRUCTION_OVERHEAD_PER_CANISTER to 16m