Long Term R&D: Scalability (proposal)

diegop · December 7, 2021, 2:02am

1. Summary

This is a motion proposal for the long-term R&D of the DFINITY Foundation, as part of the follow-up to this post: Motion Proposals on Long Term R&D Plans (Please read this post for context).

This project’s objective

In order to be able to serve millions of smart contracts, the scalability of the Internet Computer protocol and implementation will be improved to support more nodes per subnet to be able to tolerate a higher number of malicious nodes, higher throughput, and less overhead for users as well as for network operations.

2. Discussion lead

Yvonne-Anne Pignolet

3. How this R&D proposal is different from previous types

Previous motion proposals have revolved around specific features and tended to have clear, finite goals that are delivered and completed. They tended to be measured in days, weeks, or months.

These motion proposals are different and are defining the long-term plan that the foundation will use, e.g., for hiring and organizational build-out. They have the following traits and patterns:

Their scope is years, not weeks or months as in previous NNS motions
They have a broad direction but are active areas of R&D so they do not have an obvious line of execution.
They involve deep research in cryptography, networking, distributed systems, language, virtual machines, operating systems.
They are meant to match the strengths of where the DFINITY foundation’s expertise is best suited.
Work on these proposals will not start immediately.
There will be many follow-up discussions and proposals on each topic when work is underway and smaller milestones and tasks get defined.

An example may be the R&D for “Scalability” where there will be a team investigating and improving the scalability of the IC at various stages. Different bottlenecks will surface and different goals will be met.

3. How this R&D proposal is similar to what we have seen

We want to double down on the behaviors we think have worked well. These include:

Publicly identifying owners of subject areas to engage and discuss their thinking with the community
Providing periodic updates to the community as things evolve, milestones reached, proposals are needed, etc…
Presenting more and more R&D thinking early and openly.

This has worked well for the last 6 months so we want to repeat this pattern.

4. Next Steps

Developer forum intro posted
1-pager from the discussion lead posted
NNS Motion proposal submitted

5. What we are asking the community

Ask questions
Read 1-pager
Give feedback
Vote on the motion proposal

Frankly, we do not expect many nitty-gritty details because these are meant to address projects that go on for long time horizons.

The DFINITY foundation’s only goal is to improve the adoption of the IC so we want to sanity-check the projects we see necessary for growing the IC by having you (the ICP community) tell us what you all think of these active R&D threads we have.

6. What this means for the existing Roadmap or Projects

In terms of the current roadmap and proposals executed, those are still being worked on and have priority.

An intellectually honest way to look at this long-term R&D project is to see them as the upstream or “primordial soup” from which more baked projects emerge from. With this lens, these proposals are akin to asking, “what kind of specialties or strengths do we want to make sure DFINITY foundation has built up?”

Most (if not all) projects that the DFINITY foundation has executed or is executing are borne from long-running R&D threads. Even when community feedback tells the foundation, “we need X” or “Y does not work”, it is typically the team with the most relevant R&D area that picks up the short-term feature or project.

diegop · December 7, 2021, 4:46am

Please note:

Some folks gave asked if they should vote to “reject” any of the Long Term R&D projects as a way to signal prioritization. The answer is simple: “No, please, ACCEPT”

These long-term R&D projects are the DFINITY’s foundation’s thesis at R&D threads it should have across years (3 years is the number we sometimes use internally). We are asking the community to ACCEPT (pending 1-pager and more community feedback of course). Prioritization can come at a separate step.

yvonneanne · December 7, 2021, 5:45am

Hi everyone

I’m Yvonne-Anne, one of the researchers at DFINITY. My work is centered around distributed
systems, ranging from the design and analysis of algorithms for reliable and efficient distributed systems despite failures and malicious behaviour to complex network analysis.

While I’m driving this proposal, many other researchers and engineers will be working on the topics of this proposal to realize the vision of the IC to scale with its users and load.

yvonneanne · December 7, 2021, 4:15pm

Scalability Motion proposal

Summary

The current version of the Internet Computer protocol provides a solid foundation for the execution of canister smart contracts which can be enhanced and extended to realize the IC’s potential to serve millions of smart contracts. More precisely, the following shall be investigated and suitable mechanisms designed, analysed, implemented, and tested:

Many more subnets with frequent inter-canister calls across different subnets. The communication overhead for two canisters running on different subnets should be kept to a minimum and new nodes should be able to join subnets fast even if the state of the canisters they host is huge.
Large subnets with hundreds of nodes to tolerate a higher number of faulty nodes, yet exhibit low latency and high throughput for update calls.
Developers and users sending large messages to canisters as well as inter-canister messages.

To achieve this, the current bottlenecks are investigated and then suitable mitigation strategies are designed and implemented. This will involve research and development efforts in many components and require a diverse set of engineers and researchers with expertise in Distributed Systems, Cryptography, Performance Analysis and Management, Operating Systems, and Networking and is expected to take multiple years.

1. Objective

Design, analyse, implement, and test a more scalable version of the IC protocol to achieve higher throughput, more nodes, efficient routing across subnetworks, faster state synchronization, and automatic distributed load balancing despite malicious entities participating in the protocol.

2. Background

The IC currently comprises a system subnet for the NNS canisters running on 37 nodes and a bit more than two dozen app subnets each running on 13 nodes each and maintaining a total of a little over 400 GB of state. Most of the current traffic consists of update and query calls from users to smart contract canisters. With the projected growth and the smart contract canisters becoming more sophisticated, not only will the load on the IC increase, it will also change to feature more inter-canister messages. Moreover, higher security requirements on system subnets (and potentially other subnet types) benefit from higher numbers of nodes per subnet…

3. Why this is important

This project will help developers and consumers of IC dapps in the following ways:

Decreased latency combined with higher throughput & availability - Thanks to the work described in this motion proposal, future versions of the IC protocol and its implementation will be more scalable and robust. As a consequence, developers and users will benefit from decreased latency and higher throughput and availability.
Lower the overhead experienced by users - An enhanced management of node and subnet configuration will simplify adding more nodes and subnets, which in turn alleviates the load per subnet and an improved routing for messages between subnets will lower the overhead experienced by users further.
Orthogonal to these measures, a higher number of nodes in important subnets like the NNS subnet or other system subnets is desirable to make them tolerate more faulty nodes.

4. Topics under this project

In order to enhance the IC, an improved version supports more subnets with more nodes, larger messages and an efficient backup mechanism.

The design, implementation, testing, and analysis of these features will rely on research and development on the following non-exhaustive list of topics:

New routing mechanisms with frequent inter-canister calls across different subnets, relying on deterministic or randomized overlays
Traffic shaping and scheduling algorithms to provide quality of service guarantees despite Byzantine players
Construction of overlays for efficient message dissemination within and between subnets and a robust mechanism to adapt the overlays over time.
Efficient mechanisms to (re)join a subnet, peer discovery and state synchronization
Chunking at different layers of the IC stack
Speculative algorithms for the execution and verification of canister message processing
Separating the dissemination of messages to canisters from ordering them
Erasure coding for reliable broadcast
Distributed monitoring and analysis
Node and subnet configuration and key management in the NNS registry canister

5. Key milestones

The following list of potential milestones indicate the ambitions for this project and will be adapted to suitable values as the corresponding work packages are tackled.
Input on the prioritization of the different milestones is highly appreciated, the current numbering is not supposed to indicate priorities or a sequence in which they will be achieved.

Many subnets: Latency of 90% of the inter-canister messages on 1000 subnets with a “typical” workload distribution is below 5s
Large subnets: Subnets with 200 nodes achieve a block rate of 1/s
Resilience: No bad peer can deteriorate the throughput and latency by more than 5% in and across subnets of at least 50 nodes
Large messages: Update and query requests of up to 10GB are supported

6. Discussion leads

Yvonne-Anne Pignolet is driving the motion proposal, Yotam Harchol, David Derler Manu Drijvers Thomas Locher and other team members will be available for discussions.

7. Why the DFINITY Foundation should make this a long-running R&D project

To support the projected growth and adoption rate with an outstanding development and user experience, the Internet Computer must be enabled to scale out and adapt to its workload automatically while preserving its security guarantees. Therefore, the DFINITY Foundation is committed to investigating and designing the next generation of the IC protocol by solving the above-mentioned problems, so the IC can cope with very high loads and tolerate Byzantine nodes.

8. Skills and Expertise necessary to accomplish this

Tackling the challenges mentioned above relies on research and development efforts in many components and requires a diverse set of engineers and researchers with expertise in Distributed Systems, Cryptography, Performance Analysis and Management, Operating Systems, and Networking and is expected to take multiple years.

9. Open Research questions

How can overlays for the communication between and within subnets of different sizes be constructed and used efficiently?
How can the management of node and subnet configuration and keys be distributed among multiple canisters?
How can a node (re)joining a subnet discover its current overlay peers and obtain the necessary information to participate in the protocol quickly?
How do subnets discover which other subnets have canister messages for them and how do they schedule their nodes to exchange them?
How can the traffic be shaped to utilize the available bandwidth between nodes in the same and different data centers efficiently despite malicious nodes?
How can one design OS and networking scheduling algorithms to provide Quality of service guarantees despite Byzantine players?
How can large messages be partitioned, stored and disseminated efficiently on the different layers of the IC stack?
How can one design speculative algorithms for the execution and verification of canister message processing?
How can the dissemination of messages be separated from ordering them with minimal overhead?
How can the Internet Computer be monitored and analysed in a distributed fashion?

10. Examples where community can integrate into project

Due to the wide scope of required expertise for this motion proposal, it is expected that it will be carried out in tight interaction with the community. In particular, it is planned to organise workshops as the motion proposal evolves to discuss priorities, solution approaches, and implementation. Furthermore, a critical assessment and discussion regarding the security and growth and usage assumptions is of strong interest.

11. What we are asking the community

Review comments, ask questions, give feedback
Vote accept or reject on NNS Motion

jzxchiang · December 8, 2021, 7:53am

Random thought: what if we did on IC what Ethereum does and let anyone run a node for query purposes only?

For example, I could download the catch-up package for a subnet, initialize my node from that, and then use that node to serve query requests. (Periodically also syncing with the latest subnet state by downloading newer catch-up packages.) Certified variables, which are native to IC (but don’t exist on Ethereum AFAIK), could help keep these “query nodes” honest.

Could this scale queries way beyond what’s currently feasible? What am I missing?

yvonneanne · December 10, 2021, 11:32am

Random thought: what if we did on IC what Ethereum does and let anyone run a node for query purposes only?

This is definitely part of the options that should be explored to make queries more scalable. As you identified correctly, making sure that nodes serve correct (and not too stale) responses is crucial. In the future resources used for queries should also be charged for, so this needs to be taken into account as well.

jzxchiang · December 11, 2021, 7:31am

Ah interesting, I’m curious why queries were free when the mainnet launched?

lastmjs · December 11, 2021, 2:50pm

Just a suggestion to keep in mind the possibility of node shuffling in the future, as described and debated here: Shuffling node memberships of subnets: an exploratory conversation

yvonneanne · December 13, 2021, 11:02am

To ensure there is a launchable version in May, we had to simplify a number of components. Only charging for update calls in the beginning was one of the decisions we took.

yvonneanne · December 13, 2021, 11:04am

Can you elaborate on how you think this is related to the scalability motion proposal?

diegop · December 20, 2021, 7:33pm

proposal is live: Internet Computer Network Status

marcio · April 21, 2022, 7:47am

Is there any progress towards the goal of reaching 200 subnet nodes with 1s finality?

I think that is what’s is needed in order for IC to be taken seriously.

yvonneanne · April 21, 2022, 9:51am

We are not actively working on this at the moment, as we’re focussing on having features like BTC integration, SNS and HTTP calls from canisters, which will bring more value to the IC, imho.

Can you explain why you think a subnet with 200 nodes and 1s finality is necessary to be taken seriously and should take precedence?

marcio · April 21, 2022, 10:53am

I’m not saying it should take precedence. I just read a post in reddit and was wondering if there is some idea on how to continue scaling subnets in this direction. (reddit post ). I do agree that these other items are more important right now.

In words of Dom talking about badlands:

“it will benefit from the maximum conceivable level of decentralization and censorship resistance, something that is held in great esteem by the blockchain community”

Blockchain community love greater number of nodes .

It would be nice to clearly explain how secure are 13 node subnets and how much value locked do you expect them to have (plus with secure enclaves and node shuffling in the future).

yvonneanne · April 21, 2022, 12:32pm

Thanks, Marcio, that’s a very helpful link.
From our nightly experiments we know that running subnets with 55 nodes works fine.
With respect to the security offered by more nodes, imho, this is rather subjective. Therefore the problem is best addressed by a discussion in the governance category of this forum and/or a motion proposal on the NNS. Manu posted an answer in the reddit thread along these lines.

icme · April 27, 2022, 12:52am

First off, I want to say I’m super excited about this research - thanks for this on @yvonneanne!

A few questions:

In terms of the measuring the outcome of this research against various goals, what are some metrics such as query and update request throughput or latency that the foundation will hope to achieve from this research?

I understand that is still very early and that nothing is very certain, but it would be interesting to understand the types of outcomes and results that the foundation would consider a success, and to understand which scalability/performance metrics have hard limits, which metrics can theoretically be improved by a certain factor or have significant of room to improve upon, and why this is the case.

For example, when you say

Let’s say we graph out subnets increasing # of nodes vs. the metrics you mentioned. What does that look like as we scale up the size of the subnet? Do the metrics such as latency and throughput level off or do they keep scaling out linearly? If they level off or there are diminishing returns, why?

I would assume many projects have extended roadmaps, and it would be exciting to pre-emptively plan out a series of long-term project feature that would be incredibly useful. Say I knew that if a subnet had x nodes, I could support 50,000 queries a second

icme · April 27, 2022, 1:03am

Also, on a separate scalability discussion vector, I had this thought a week or so about using local replicas to clone canisters in order to scale horizontally (one can pull in a replica and then split a canister’s data in half between the canister and its clone, say through the right and left sub-trees of a Red-Black Tree).

Curious if there are any initiatives and thoughts on how feasible this would be, and how hard it would be for multiple new local replicas to be spun up (for resilience) in order to replace the local replica which was used as a clone, and to then spin up new replicas for the fresh clone.

If it takes awhile to spin up additional replicas, would love it if this functionality could be given to developers. As a canister is starting to reach capacity, start spinning up 5-10 more replicas, and then when this is complete, split the canister and its local replicas in 2 all at once, effectively creating an instant clone.

I could call “start spinning up additional replicas” at 65% capacity and then perform the split at 75% capacity, or something like that.

yvonneanne · April 27, 2022, 5:37am

Thanks for your questions, @icme !

I’d love to have time to start working on these topics, but right now the teams that would mostly be involved in this are focussing on BTC integration and HTTP calls from canisters.

Note that these Long Term Proposals are really multi-year projects and due to our limited number of people, we are not tackling all of them in parallel.

As you’re probably well aware of, the overall IC throughput of update and query calls can be increased linearly by adding more subnets and node respectively.
There are hard latency and per-subnet update call throughput limits imposed by the bandwidth and geographical distance between the DCs and the query throughput limits depend mostly on the node machine HW. These limits apply to all BFT smart contract platforms.
If you look at Internet Computer performance - Internet Computer Wiki, you’ll see that the IC features very good performance numbers already.
A detailed analysis or projection to how close we can approach the limits has not been performed, as everyone is working on other topics at the moment.

clone canisters in order to scale horizontally
Maybe @johan can chip in with information on what is being discussed in this direction or point to another forum thread?

johan · April 27, 2022, 12:07pm

Hi @icme - I’ve been working on a roadmap and plan for better support for “big data” on the IC. However, the last couple of weeks I’ve been pulled into a special project, and next week I’m going on vacation. Thus, I don’t expect to be able to give a forum update on “big data” until towards to the end of May. Thus, stay tuned, but don’t hold your breath…

Topic		Replies	Views
Motion Proposals on Long Term R&D Plans Roadmap	17	8122	June 10, 2022
Long Term R&D: Node Performance (proposal) Roadmap	6	2107	January 30, 2023
Long Term R&D: Storage Subnets (proposal) Roadmap	44	5351	January 3, 2024
Long Term R&D: Subnet splitting (proposal) Roadmap	10	2560	April 6, 2023
Long Term R&D: General Integration (Proposal) Roadmap	20	5313	May 16, 2023