Blessing/Electing Replica Binary (NNS proposal #26833)

diegop · October 28, 2021, 3:53am

Hey folks,

I told the community that I would try to bring more attention (and explanations) to blessing/electing replica binaries.

I want to give attention to the latest one submitted: Internet Computer Network Status

Few notes:

Once a binary is blessed/elected, there are subsequent NNS proposals to update each subnet. Each subnet requires an NNS proposal to update its running code. The last subnet updated is the NNS subnet. We usually make the proposals over the course of days so that things can be rolled back in case of any issue with a binary. Typically a new binary is out every 1-2 weeks.
Currently, the code in the NNS proposal is not legible before the NNS vote, but we are very close to addressing the technical issue that blocks this. Very soon, all NNS proposals will have the code changes along with the binary. The team has been working for a while on this as a high priority item.
The intent is that when an NNS proposal is proposed for blessing/electing, I want to surface it so people can take a look or ask questions. Just because the code is not visible beforehand, it does not mean I should not try to get as much visibility on them. Indeed, the foundation typically waits to vote on these in case people bring up issues.
I am human so I am too slow to make these threads. I am trying to get better at this.

Changelog of what is in this new version:

Boundary nodes: TLS cert update fixes
Boundary nodes: Cache query calls in nginx
Consensus: Update consensus ECDSA payload types
Consensus: Fix problems when a node is joining a busy subnet
Crypto: Implement TLS client handshake using rustls
Crypto: Improve client and server certificate verification
Crypto: Refactor IDKG API
Crypto: Improve Threshold Signature benchmarks
Crypto: Initial implementation of MEGa encryption for IDKG
Crypto: Update zkcrypto/pairing dependencies
Execution: Introduce per canister heap delta limit to share the heap delta capacity in a fairer manner between canisters.
Execution: reduce the instruction limit for executing install_code messages on dedicated subnets. It was initially set to a higher value to support specific use cases but after looking at data from recent install_code requests the value is safe to be lowered to the same value as non dedicated subnets.
Execution: Adjust cost of various system API calls to better reflect the actual amount of work done in these calls.
Execution: Use threadpool::ThreadPool instead of rayon for query handling. Rayon shares thread pools underneath the hood which can cause deadlocks.
Execution: Use the same instruction limit for executing queries as update messages. Before queries were using a separate constant that had the problem of becoming out of date when the limit for update messages was updated.
Execution: Reclaim allocated memory on failed message execution. When a message execution fails, we undo the changes it made, previously we were not adjusting how much memory subsequent executions had available. This meant that subsequent message executions in the round had less memory available.
Execution: Disable the new signal handler on Windows WSL as it is not properly supported.
Execution: When executing queries on wasm modules which have not yet been compiled to native, prevent multiple compilation processes.
Messaging: Track memory consumed by in-flight canister messages
Net: Remove legacy api/v1 HTTP endpoints
Net: Add rate limiting per connection
Net: Add buffering, rate limiting and concurrency limit on the ingress ingestion service
Net: Share the threadpool between query execution and the ingress message filter
Node: Rework wasm compilation cache
Node: GuestOS SELinux policy
Node: Basic CPU profiling with pprof
P2P: ECDSA message updates, artifact pool improvements
Various bugfixes and test updates

jzxchiang · October 28, 2021, 4:13am

Boundary nodes: Cache query calls in nginx

Awesome.

coin_master · October 28, 2021, 5:36am

Thanks @diegop I enjoyed reading a technical proposal for the first time, keep up the good work.

nomeata · October 28, 2021, 1:42pm

Thanks for sharing, and also thanks to the team for the great work!

Query calls have parameters and callers. Does the boundary node take them into account when caching? (If not I see a risk that one could trick the boundary node to return a “wrong” response.)

Or is this really about the HTTP gateway on the boundary nodes (i.e. “cache HTTP requests to canisters”)?

Does this introduce a new way for canister to trap at possibly any point, similar but distinct from running out of cycles? More details here might be good – for careful crafted applications it is important to understand where they can possibly trap, and how to avoid that.

ulan · October 28, 2021, 4:52pm

I am deferring the first question about query caching to @PaulLiu

Regarding the heap delta rate limiting: individual message execution remains the same as before, so there are no new traps. A message can modify several GBs of memory and thus exceed the limit. In such a case the canister will skip subsequent rounds until the rate limited heap delta catches up with the actual usage.

PaulLiu · October 28, 2021, 5:28pm

The caching is for query responses. It uses most fields in the request body except expiry & nonce to make up the cache key to avoid giving wrong responses.

nomeata · October 28, 2021, 5:34pm

Thanks! What is the cache expiry time? (If it it is too long, clients validating certificates in the response, e.g. certified variable, might think they are victim of a replay attack.)

PaulLiu · October 28, 2021, 5:49pm

It is 1 second. I wrote more about this as an update in another thread High User Traffic Incident Retrospective - Thursday September 2, 2021 - #49 by PaulLiu

nomeata · October 28, 2021, 8:59pm

Great, thanks, that sounds good and harmless

Topic		Replies	Views
Proposal to elect new release rc--2024-08-29_01-30 Governance replica , release , IC-OS-election	10	123	September 11, 2024
Voting is open for a new IC release - b9949229 Governance replica , release	3	321	September 5, 2023
Voting is open for a new IC release - 8a5509e Governance replica , release	6	780	September 17, 2022
Voting for a new IC release - 2024-01-09_23-01 Governance replica , release	1	346	January 15, 2024
Voting for a new IC release - be69c35 Governance replica , release	9	2722	December 18, 2023

Blessing/Electing Replica Binary (NNS proposal #26833)

Related topics