Proposal: Sandboxing mechanism for canister wasm execution
Objective
Protect IC nodes and canisters hosted on them from rogue canisters that try to exploit holes in the WebAssembly through maliciously crafted canister code. The attack scenarios include:
- side-channel data reads of secrets in nodes and canisters,
- all classes of remote code execution vulnerabilities in the WebAssembly JIT compiler and runtime
Background
Canister code execution is confined by the WebAssembly runtime. The constraints of the runtime are enforced by a) limiting access through the system API and b) correctness of the JIT-compiled native code derived from the WebAssembly code. The full implementation correctness cannot necessarily be fully assumed due to the complexity of the components.
Additionally, even assuming a perfectly correctly operating JIT compiler, the generated native code executes in the context of its host process. This makes it indistinguishable from all other code executing in the same process to the CPU. As a consequence, the CPU fundamentally cannot be stopped from performing speculative access to any memory reachable to the host process. This can be abused by completely legitimate WebAssembly code to trigger such speculative access. Timing measurements against code execution paths may then reveal secrets.
Stopping this class of attacks at its core is difficult or outright impossible. They can however be rendered harmless by confining the attack scope to a strongly confined “sandbox process”. No sensitive information must be held or be accessible to this process. Information that can be obtained by an attacker through other means (e.g. through the official system API through their own canisters) is not sensitive. As a consequence, a breach of the sandbox process in itself does not gain an attacker anything.
Proposal
Implement a process sandboxing mechanism for canister wasm execution. The scope will be “one sandbox per canister” to protect both the IC node itself as well as other canisters. Guarantee integrity and confidentiality of all system components under the following attack scenarios:
- side channel attack allowing arbitrary memory of the host process through side channels
- WebAssembly runtime flaws allowing remote code execution
The design of the sandbox process is such that the security properties of the IC hold under the assumption that the sandbox process is under complete attacker control.
The implementation consists of a re-design of the canister runtime to allow for separation into multiple processes, and system means to enforce isolation and confinement of the processes.
Compatibility & performance
The sandboxing mechanism will incur no user-visible functional changes to how canisters operate – there is nothing developers need to change about their canisters. It is possible that the introduction of the isolation mechanism has observable performance effects. It is believed that the initial version will exhibit some performance degradation, but it is also believed that a fully optimized revision will long-term even improve performance over the non-sandboxed execution through better memory management parallelism in the system.
Security
This proposal aims to improve security of the IC through canister process sandboxing. Under the assumption of a compromise through specially crafted code in one canister, the system guarantees the confidentiality of the following assets (meaning that a successful attacker cannot read them):
- IC node data itself (including private keys)
- User data of other canisters (including Wasm memory, stable memory and all metadata)
- System data of other canisters (including cycles, tokens & similar assets)
- Artifact pool
- Ingress / egress messages of other canisters
The system guarantees the integrity of the following assets (meaning that a successful attacker cannot modify them):
- IC node data itself (including private keys)
- User data of other canisters (including Wasm memory, stable memory and all metadata)
- System data of all canisters (including cycles, tokens & similar assets)
- Artifact pool
- Ingress / egress messages of all canisters
In the initial version we may not be able to guarantee availability of the system under all attack scenarios. This means that a successful attacker might still be able to hamper or disrupt the IC service until corrective administrative action can be taken.
Development and rollout plan
The risks of the feature are two-fold:
- complexity/stability: sandboxing is an architectural change that incurs stability risks due to complexity of implementation
- performance: we anticipate initial performance degradation, heavily loaded subnets might be pushed beyond their operational bounds
Development and rollout structure is intended to minimize these risks.
An initial functional, slow, and non-launchable version of the feature is under preparation already. It will not be rolled out onto mainnet but will serve as validation for proper design proposals to be published in the course of this process. It also allows very early rigorous continuous testing to validate system stability.
The initially launched version of this feature will provide all of the desired security properties, but may need to make performance compromises. It needs to be rolled out to the IC mainnet in a controlled fashion: We anticipate it to be activated first to head-runner subnet blockchains for live testing (especially regarding performance considerations), and then be activated throughout the entire IC mainnet over time. During this roll-out process, both the code to run “sandboxed” and “non-sandboxed” WebAssembly execution will be part of the image distributed as updates to IC nodes. Per-subnet / per-node customizations will allow selective activation.
Future versions will close the performance gap and will likely also go beyond the performance of the status quo. They will be rolled out as incremental improvements over the sandboxing mechanism activated on all IC nodes.