Long Term R&D: Verifiable subnet recovery with private state (proposal)

diegop · December 17, 2021, 1:21am

1. Summary

This is a motion proposal for the long-term R&D of the DFINITY Foundation, as part of the follow-up to this post: Motion Proposals on Long Term R&D Plans (Please read this post for context).

This project’s objective

The internet computer consists of subnets that should run in a robust manner. That is, up to < 1/3rd of the replicas of a subnet can be malicious or unavailable, and the subnet will still make progress. If more replicas are unavailable at one time, the subnet would stall, and canisters can no longer progress update messages. For such scenarios, the IC supports a subnet recovery operation: via NNS proposals, the governance system can instruct other replicas to take over the responsibility of the subnet immediately. Current Situation

Foundation has read access to NNS subnet and its own nodes for all application subnets
Read access to state on application subnets for an SSH key can get granted via NNS proposals (and removed afterwards), which enables reading the latest subnet state and blockchain artifacts
If a subnet is stuck, the foundation can perform subnet recovery by
obtaining the state + recent blockchain of subnet, which defines a last good state of the subnet
Make a recovery NNS proposal including that state hash, which also indicates which nodes should take over the responsibility of the subnet
The voters currently cannot verify that this proposal is done correctly
If we’d release the state + blockchain artifacts, the community could verify the proposal
But then the state is made public, which is not desirable

This proposal is about ensuring all NNS proposals are verifiable without revealing the state of a subnet.

2. Discussion lead

Jan Camenisch, Manu Drijvers

3. How this R&D proposal is different from previous types

Previous motion proposals have revolved around specific features and tended to have clear, finite goals that are delivered and completed. They tended to be measured in days, weeks, or months.

These motion proposals are different and are defining the long-term plan that the foundation will use, e.g., for hiring and organizational build-out. They have the following traits and patterns:

Their scope is years, not weeks or months as in previous NNS motions
They have a broad direction but are active areas of R&D so they do not have an obvious line of execution.
They involve deep research in cryptography, networking, distributed systems, language, virtual machines, operating systems.
They are meant to match the strengths of where the DFINITY foundation’s expertise is best suited.
Work on these proposals will not start immediately.
There will be many follow-up discussions and proposals on each topic when work is underway and smaller milestones and tasks get defined.

An example may be the R&D for “Scalability” where there will be a team investigating and improving the scalability of the IC at various stages. Different bottlenecks will surface and different goals will be met.

3. How this R&D proposal is similar to what we have seen

We want to double down on the behaviors we think have worked well. These include:

Publicly identifying owners of subject areas to engage and discuss their thinking with the community
Providing periodic updates to the community as things evolve, milestones reached, proposals are needed, etc…
Presenting more and more R&D thinking early and openly.

This has worked well for the last 6 months so we want to repeat this pattern.

4. Next Steps

Developer forum intro posted
1-pager from the discussion lead posted
NNS Motion proposal submitted

5. What we are asking the community

Ask questions
Read 1-pager
Give feedback
Vote on the motion proposal

Frankly, we do not expect many nitty-gritty details because these are meant to address projects that go on for long time horizons.

The DFINITY foundation’s only goal is to improve the adoption of the IC so we want to sanity-check the projects we see necessary for growing the IC by having you (the ICP community) tell us what you all think of these active R&D threads we have.

6. What this means for the existing Roadmap or Projects

In terms of the current roadmap and proposals executed, those are still being worked on and have priority.

An intellectually honest way to look at this long-term R&D project is to see them as the upstream or “primordial soup” from which more baked projects emerge from. With this lens, these proposals are akin to asking, “what kind of specialties or strengths do we want to make sure DFINITY foundation has built up?”

Most (if not all) projects that the DFINITY foundation has executed or is executing are borne from long-running R&D threads. Even when community feedback tells the foundation, “we need X” or “Y does not work”, it is typically the team with the most relevant R&D area that picks up the short-term feature or project.

diegop · December 17, 2021, 1:23am

Please note:

Some folks gave asked if they should vote to “reject” any of the Long Term R&D projects as a way to signal prioritization. The answer is simple: “No, please, ACCEPT”

These long-term R&D projects are the DFINITY’s foundation’s thesis at R&D threads it should have across years (3 years is the number we sometimes use internally). We are asking the community to ACCEPT (pending 1-pager and more community feedback of course). Prioritization can come at a separate step.

Manu · December 17, 2021, 9:34am

Verifiable subnet recovery with private state

1. Objective

The internet computer consists of subnets that should run in a robust manner. That is, up to < 1/3rd of the replicas of a subnet can be malicious or unavailable, and the subnet will still make progress. If more replicas are unavailable at one time, the subnet would stall, and canisters can no longer progress update messages. For such scenarios, the IC supports a subnet recovery operation: via NNS proposals, the governance system can instruct other replicas to take over the responsibility of the subnet immediately. This proposal is about ensuring such NNS proposals are verifiable without revealing the state of a subnet.

2. Background

The current state of subnet recovery is as follows:

Foundation has read access to NNS subnet and its own nodes for all application subnets
Read access to state on application subnets can be granted to SSH keys via NNS proposals (and removed afterwards), which enables reading the latest subnet state and blockchain artifacts
If a subnet is stuck, the foundation can perform subnet recovery by
- obtaining the state + recent blockchain of subnet, which defines a last good state of the subnet
- Make a recovery NNS proposal including that state hash, which may also indicate which nodes should take over the responsibility of the subnet
The voters currently cannot verify that this proposal is done correctly
- If we’d release the state + blockchain artifacts, the community could verify the proposal
- But then the state is made public, which is not desirable

3. Milestones

1: Subnet recovery is fully verifiable if the subnet state is public.

Some subnets may hold sensitive information, while others do not. One potential forward direction is to distinguish between these, and first make subnet recovery fully verifiable if the state can be public. This means that every voter can ensure that the state is the correct state and verify all aspects of the proposal.

2: Encrypted subnets can be recovered in a verifiable manner.

Once subnets run with trusted execution environment capabilities (find the relevant motion proposal here), this can significantly improve the confidentiality guarantees. For such subnets, subnet recovery should still be available, without affecting confidentiality. This requires significant protocol changes and extensions

The NNS can create encryption keys for every subnet, which are securely held inside the trusted execution environment
A subnet will produce blockchain artifacts that can be validated cryptographically while not revealing subnet state. That likely means that e.g. catch-up packages must sign encrypted state.
From an encrypted state and blockchain artifacts, a verifiable subnet recovery can be performed. That means that the NNS can give the new subnet members the relevant decryption key, and they can recover from the state implied by the encrypted state and the blockchain artifacts.

4. Discussion leads

@Jan, @Manu

5. Skills and Expertise necessary to accomplish this

The first milestone is mainly an engineering effort. The second milestone is a much greater effort, and will require cryptographic protocol design and trusted execution environment expertise.

6. What are we asking the community

Review comments, ask questions, give feedback
Vote accept or reject on NNS Motion
Participate in technical discussions as the motion moves forward

diegop · December 20, 2021, 7:25pm

Proposal is live! Internet Computer Network Status

Topic		Replies	Views
Subnet `lhg73` is stalled NNS proposal discussions Subnet-management	38	735	September 18, 2024
Long Term R&D: Scalability (proposal) Roadmap	18	4799	April 27, 2022
Motion Proposals on Long Term R&D Plans Roadmap	17	8124	June 10, 2022
Long Term R&D: Subnet splitting (proposal) Roadmap	10	2560	April 6, 2023
Long Term R&D: Storage Subnets (proposal) Roadmap	44	5356	January 3, 2024