Subnet `lhg73` is stalled

Obviously at this point voted to adopt.

1 Like

Indeed, I’ve also adopted :slight_smile:

1 Like

Things look good. The subnet started making progress again. Next steps are:

Tomorrow (Monday):

  • roll out version 5d1beaca74d0bf2ad2e8a37809e1e3ffa5ee9a32 to all subnets that are running 3318f746113e7695ae0b904be5a48820884eb6d8
  • elect a new version c180069c95739773293599d90c23defc12c3084f and roll it out to all subnets that are currently running 290fd2ae93a02e1ab579895c806161e42b6acf2b
  • make the change public, to allow public verification.

Thanks everyone for participating and giving constructive and supportive feedback!

7 Likes

I’ve voted to adopt Subnet Management proposals 132499, 132502 & 132503 (as a CodeGov followee for this topic) as components of a critical hotfix in accordance with the security policy. As seen on the dashboard, the subnet stalled at block height 116106307 at approximately 16:20 UTC, and a CUP block height of 116107000 was executed in proposal 132502. Thanks @sat and team for your fast work on this!

I agree that displaying UTC times in the dashboard is probably a bit better. I’d also like to see a way to verify the CUP state_hash. I searched around for a way to find block details by hash but couldn’t find anything.

6 Likes

Voted to adopt, in accordance with the Security Patch Policy and Procedure.

7 Likes

The next proposal is live :

Voted to adopt, in accordance with the Security Patch Policy and Procedure.

5 Likes

I’ve adopted both 132500 and 132507 in accordance with the Security Patch Policy and Procedure.

6 Likes

Voting ā€œYesā€ to adopt both 132500 and 132507 in accordance with the Security Patch Policy and Procedure.

3 Likes

In accordance with my Reject-if-you-can’t-verify policy, I’ve voted to reject 132507

2 Likes

Voted to adopt both 132500 and 132507, in accordance with the Security Patch Policy and Procedure.

1 Like

Voted to adopt, in accordance with the Security Patch Policy and Procedure.

1 Like

I’ve adopted both 132500 and 132507 proposals in accordance with the Security Patch Policy and Procedure.

2 Likes

The changes have now been published in

and

In particular, the change is only removing a deprecated assert:

5 Likes

Thank you for sharing so quickly, was there anything in particular that actually triggered the check if canister_state_changes is not None and, then asserts that the removed_cycles field of the system_state_changes is 0. I mean like exceptions that occured during the removed_cycles call or if canister_state_changes was unexpectedly None , since the handle_wasm_execution_of_cleanup_callback function now behaves the same.

1 Like

I’ve shared the link with @dsarlis , he’s the team lead and the domain expert, so can provide an answer with much more context than me.

2 Likes

canister_state_changes is going to be Some(changes) instead of None if the canister made changes to its state that need to be persisted (perfectly normal for a cleanup callback).

The assert that used to be there was trying to check that there was no case that the canister’s cycle balance was reduced as part of those changes while executing the cleanup callback (note that we are talking during execution, obviously the canister is charged for executing the message but that happens after execution). This was true when the assert was first introduced (that’s about 2 years ago).

In the mean time, things (and some assumptions) have changed. The storage reservation mechanism is one case where if the subnet is above the storage capacity where this mechanism kicks in (450GiB right now), it would result in some cycles being reserved and removed from the cycles balance when the canister would try to allocate some memory in its cleanup callback. The ic0.cycles_burn API is another case which would result in cycles being removed from the canister’s balance (and it’s allowed to be called in a cleanup). So, the assert was simply checking the wrong invariant given the state of the implementation.

Obviously, we should have caught this when these features were implemented and we have been discussing how to improve this going forward. We will post a post-mortem (as per our usual practice) once we have finalized these discussions.

6 Likes

I have voted to adopt proposals 132499, 132502 and 132503 in accordance with the security policy. Thanks @sat for the work!

I also agree with @Lorimer and @timk11 that using a standard timezone across all platforms is more convenient.

2 Likes

ADOPT both 132500 and 132507 in accordance with the Security Patch Policy and Procedure.

3 Likes

I’ve voted to adopt proposals 132500 and 132507 in accordance with the Security Patch Policy and Procedure

2 Likes