Incident Handling with the New Boundary Node Architecture

Thanks for explaining further @rbirkner

2 Likes

Hi all,

We are excited to share an update on the final steps in transitioning to the new decentralized boundary architecture. As we previously mentioned, incident handling was the last major blocker, and we are now prepared to tackle it.

With the adoption of proposal 134775, the rate-limit canister has been installed in the II subnet (uzr34) under the control of the NNS. This canister (see diagram below) is designed around two key principles:

  • Enable swift reactions to incidents by protecting vulnerable IC components, such as canisters and subnets
  • Ensuring transparency and auditability by the IC community after an incident

In a nutshell, this canister is an append-only storage of rate-limit configs. Each config itself is an ordered set of rate-limit rules [rule_1, rule_2,…], where each rule_i protects individual canister/s or subnet/s via enforcing the rate-limit on the API boundary nodes. See an example of a rate-limit rule here.

Three types of entities will interact with this new canister:

  • Authorized principal (specified in the canister install proposal)
    This principal has write permissions to append new rate-limit configurations and disclose these configurations once an incident has been resolved
    Called methods:
    add_config() - invoked during the incident phase
    disclose_rules()- invoked after the incident has been resolved

  • API boundary nodes
    These nodes enforce the latest rate-limit configuration stored in the canister. They periodically fetch the latest configuration from the canister and apply it immediately. API boundary nodes have special permissions to retrieve all rate-limit rules in an unredacted format.
    Called methods:
    get_config() - periodic polling, retrieves the latest unredacted configuration

  • All Other IC Users
    These users experience the enforced rate-limits. For the transparency and auditing purposes, they can inspect all rate-limit configurations that have been pushed to the canister. The key difference from API nodes is that retrieved configurations will only expose disclosed information. Rules that are still confidential will be redacted, although their presence (like ID) will always be shown.
    Called methods:
    get_config(opt version) - retrieves redacted configuration
6 Likes