Long Term R&D: Storage Subnets (proposal)

Summary

This project aims to improve the support for dapps with high storage requirements on the Internet Computer.

Background

Currently the Internet Computer has two types of subnet blockchains: system (with high replication) and application (with medium replication).

System subnets with high replication factor are necessary for platform crucial services like the NNS. They are costly to operate (lots of nodes are needed); slower to update (the finalisation rate is slower to accommodate the additional nodes); but offer very high security. Application subnets have medium replication so they are cheaper; faster to update; but have slightly lower security.

Objective

As the Internet Computer evolves, it is possible to imagine other types of subnets that operate at different points on the design spectrum and have different trade offs between security vs. speed and cost.

This motion proposes that the DFINITY organisation invest engineering resources into researching and developing additional types of subnets. More concretely the motion proposes to explore the concept of storage subnets. The core feature of this subnet type will be:

  • It uses node machines with higher storage capacity and
  • It operates with fewer nodes (smaller replication factor) than other subnet types.

In order to realise this new subnet type, research into the following topics will be needed:

  • Intra-subnet protocol improvements: to ensure that a subnet with a large state is operational.
  • Inter-subnet protocol improvements: to ensure that a less secure subnet cannot impact the security and functionality of other subnets.
  • Data integrity improvements: to ensure that data integrity is maintained with lower replication factor.

Why this is important

At 5 USD / GiB / year, the Internet Computer today already has very competitive fees for storage. This feature will allow the IC to offer storage to dapps at even lower costs (albeit with potentially different semantics and guarantees). Lower storage costs will enable a new class of dapps on the IC that today are prohibitively expensive to run on the IC and help improve resilience of existing dapps by allowing them to store backups of their data.

Due to the lower replication factor, fewer nodes will be needed to provide the same amount of storage capacity on the IC. This means that for the same number of nodes, the IC will be offering a bigger storage capacity.

Topics under this project

Intra-subnet protocol improvements

The new storage subnets will have larger states. So additional improvements to the protocol and implementation will be needed to ensure that the large states are properly handled. One obvious component that will have to be improved is state synchronization. This component is responsible for allowing new nodes or slow nodes to catch up with the latest state. Improvements will be needed to ensure that nodes can still catch up even with larger states.

Inter-subnet protocol improvements

Due to the lower replication factor, the new subnet type might be easier to corrupt or to stall. Protocol improvements will be needed to ensure subnet isolation so that faults in one subnet cannot spread to other subnets.

Data integrity improvements

A subnet with a lower replication factor can tolerate fewer corrupted nodes. Protocol improvements will be needed to ensure that as long as at least one honest node is available, data integrity will be guaranteed.

Discussions leads

@akhilesh.singhania , @bogdanwarinschi , @derlerd-dfinity1

Why the DFINITY Foundation should make this a long-running R&D project

This project will enable an important class of dapps on the IC. Additionally, a number of protocol and implementation improvements required to achieve the goals of this project will also be applicable to other parts of the IC and improve the IC in general.

Skills and Expertise necessary to accomplish this (maybe teams?)

Due to the complexity of the initiative, a broad selection of skills as outlined next:

  • System design

  • System level software engineering

  • Algorithms, complexity

  • Probability theory

  • Cryptography

  • Deep understanding of Internet Computer consensus

  • API design

At least the following teams are likely required:

  • Research

  • Networking

  • Consensus

  • Message Routing

  • Execution

  • Security

Open Research questions

There can be multiple other mechanisms to achieve the desired goals of this project.

Another idea is to use erasure codes to split the data on multiple nodes. With this latter approach you could have a subnet with many nodes and high resilience, but where the total storage overhead is small (<2x). Communication is higher during storage and retrieval though (<2x) and the nodes that store data must compute the codewords. Also, search is not as easy, so it depends on whether data will be fully at rest. Also, if the nodes running the storage network change, you have to run an expensive recoding.

Examples where community can integrate into project

As already mentioned before, in the initial phase of this motion input regarding refining the scope and priorities of this project from the community is highly appreciated. In addition many technical discussions with the community are anticipated as the motion and research and development of potential technical solutions to address the goals of this proposal move forward.

What we are asking the community

  • Review comments, ask questions, give feedback

  • Vote accept or reject on NNS Motion

10 Likes