Summary
Currently, the amount of computation a smart contract can perform per WebAssembly call is limited by the block time. Deterministic time slicing allows for long(er) running, multi-round computations by suspending the execution of an update message at the end of one round and resuming it later.
Status
Discussing
Key people involved
Andriy Berestovskyy (@berestovskyy), Bogdan Warinschi (@bogwar), Ulan Degenbaev(@ulan).
Timeline
1-pager posted on the forum for review: February 2, 2022.
Community Conversation: TBD.
NNS Motion Proposal (to approve design + project): TBD
If the NNS Motion Proposal passes, implementation and deployment would take a few months: April 2022.
Objective
The current implementation of the Internet Computer (IC) does not support multi-round WebAssembly execution. In other words, if a WebAssembly function starts executing in one round, then its execution must finish before the next round can start. This introduces a trade-off between the fast finality and the amount of work a smart contract can do in a single update message execution.
To ensure fast finality, IC imposes a limit on the number of instructions per WebAssembly call. If the limit is exceeded, then the execution traps with an error:
Canister <id> exceeded the cycles limit for single message execution
.
The problem is that the existing limit is not sufficient in some important cases:
- Pre- and post-upgrade hooks of smart contracts. They potentially serialize and deserialize the entire Wasm memory.
- Garbage collection in Motoko smart contracts. The garbage collection potentially touches every byte of the Wasm memory.
Besides these cases, a smart contract may need to perform other long-running computations specific to its domain.
The goal of this project is to fix the problem by supporting multi-round WebAssembly execution of update messages. Note that supporting long-running query messages is out of scope of this proposal.
Proposal
The proposal is to implement deterministic time slicing (DTS) – a mechanism that automatically suspends WebAssembly execution when it reaches the per-round instruction limit and resumes the execution in a subsequent round. The mechanism should be transparent to the smart contract and the developers, such that pausing and resuming are not observable. Specifically, DTS should guarantee atomicity and isolation of a long-running execution:
- atomicity: if an error occurs in any chunk of the execution, then the smart contract rolls back to the state before the entire execution has started.
- isolation: other update messages do not run until the entire execution completes. Queries may be executed concurrently, but they must use the state before the execution has started.
Figure 1. Pausing and resuming WebAssembly execution in the sandbox process.
Developer experience
The main observable effect of DTS is increased instruction limit per WebAssembly execution. Many messages that currently fail due to the instruction limit error, will succeed once DTS is implemented. Developers do not need to take any actions to benefit from DTS.
The initial implementation will gradually increase the instruction limit by 10-50 times. Thus allowing a single message execution to span up to 50 rounds. Multi-round executions will likely have a small performance and memory overhead, which will be reflected in the execution fees. The performance and the costs of single-round messages will not be affected.
As a consequence, the cost of infinite-loop bugs will increase by 50x. One of the follow-up features being considered is to allow developers to lower the instruction limit per method if they know the upper bound and want to guard against the infinite-loop bugs.
Implementation
The initial implementation is estimated to take about 4 SWE-months. Most of the changes will be in the execution layer of IC. The main risks and challenges:
- Multi-round messages will likely have higher performance and memory overhead.
- The message scheduler needs to support multi-round messages while preserving the compute allocation and fairness guarantees.
- Pausing and resuming mechanism may increase complexity of the IC.