The fleet problem/opportunity of the Internet Computer

Existential threat
There is a lot of talk regarding the superior software innovation of the Internet Computer, which I am very excited about. But we must also acknowledge that without the hardware (computational capabilities) keeping pace, it will become a bottleneck. In fact, I think it already has, and looking at our pace, it is much worse than one would expect. There are also worrying aspects in the current design that will become an attack vector in the future. I do want to mention that I don’t want to tackle or single out anyone specific with this letter. It’s a genuine worry based on what I see and know. I wouldn’t be surprised if a lot of work is happening behind the scenes. Let me first provide some context and then formulate some critical questions from that.

Let’s first look at the growth of our computational capacity (from what I have been able to gather).
The following graph shows that we haven’t added a single node provider in 2 months. I can relate, as I have been in the queue for a few months and haven’t been contacted since the beginning.


While this graph shows that we have grown our fleet to 228 nodes.

To provide some context on this magnitude, an avg global tech company has 10s of thousands of nodes running on cloud providers. I know, because I still work in such a traditional company, where we manage our enormous AWS fleet. This is just one company, in the ocean of AWS cloud. My concern here is not that we are far from AWS capabilities. It’s more about the lack of a clear plan for how to get there.

We couldn’t host a killer app on the Internet Computer right now. It would be fragile and not perform. Without an equally innovative master plan to increase our fleet size, we will not achieve the desired outcome. I have heard statements of having eventually millions of node machines, but how exactly are we going to get there? This is just wishful thinking at this point. Not sure about the size of our nodes exactly, but we must have a few dozen racks of servers right now in the world. This is nothing!

Public node providers will be an easy attack vector
The whole premise of the Internet Computer is blockchain singularity - crucially important and for me, a revolutionary point in time - meaning separating from the Traditional System and enabling an infinite digital platform for creators worldwide. It is certainly possible, but not with the current design. Our actual computational capabilities could easily be hurt by governments right now. In effect, they could take down a large part of our system, which will have catastrophic consequences. While being decentralized, publicly known node providers will inevitably be targeted. As I understood, the only reason to know the node providers is to evenly distribute our fleet geographically and among identities. That’s certainly important, but the current strategy will also make our system vulnerable. Imagine in 5y, when IC has grown 10x, China/Europe/USA states/etc asking their local providers to shut down immediately (due to restricting freedom of speech). While NNS community can vote for freedom, despite the cryptographic innovation, this could still wipe out large parts of IC and hurt millions of people in the community. Node providers should be pseudonymized and not known geographically to the Traditional Systems.

Onboarding new node providers is the biggest bottleneck of adoption as I see. As I mentioned, I have been in the queue for a while. Specifically, I would be the first node provider in the Baltics/Scandinavia region, so would surely bring further decentralization to the network. I also have a Level3+ (high-tech/standards) green energy datacenter as a partner, ready to get started. You cannot get any better than this, to be completely honest. And yet, nothing. I lack visibility and don’t know. Surely a tremendous amount of work is happening inside Dfinity for this, but at the same time, I don’t see this being talked about anywhere, which makes me worry.

Here are some feature requests for Dfinity and the community:
- visibility into the node provider queue and pace. I would like to know how many are in the queue, what are the timelines, what is the projected node count into the future (based on queued entities)? This info should be known at least inside the community (e.g. II entities), and updated regularly/daily.
- streamlined scalable procedure to onboard new node providers exponentially for the next decade. Most likely and desirable, as far as I see, onboarding node providers should be decentralized. There is no reason known to me that this has to go through Dfinity (it can in the beginning) nor that it has to make the data providers known. We should aim to obfuscate this, making the system more robust.
- we need a dedicated FLEET team and Dfinity focus with the goal to “set and follow the multi-year trajectory of IC nodes distribution”. This is not a given and requires innovation. How to build a decentralized fleet in the world of chip shortages, governmental pressures, resource scarcity, and expect to keep up the pace of software (exponential) growth? Very hard. We shouldn’t wait 4y before we start thinking about it seriously. It’s a fundamental part of the IC value proposition and the current graphs and lack of visibility show that something needs to be changed.
- In the meantime, I do want to tag somebody so we can get going. I don’t know enough, so I’ll go with the one person I see always willing to help, @diegop. Please help, who is the appropriate contact person from Dfinity side?
- Badlands is not the solution. It is an interesting idea and potentially helps with developer adoption and community building but won’t solve the bottleneck I am referring to.

Final remarks: as a system architect, I look at the Internet Computer as the front-runner to enable the next level of software development. A scalable and tamper-proof (single)system entirely on-chain is an alien tech at this point. I am absolutely fascinated by the technical stack and can already see where this could be going in the next years. So I don’t worry about the software side at all, with the self-proclaimed “biggest crypto team in the space heading the way”. But I feel the game is not won on the software side. Our sovereignty will be enabled by crypto running on hardware and to fulfil the dream, we need an absolutely amazing worldwide fleet of node machines. We need some seriously ambitious goals and a team that might actually get us there.

Thank you for your time :pray:

26 Likes

It would be nice to bring the same transparency in the recent roadmap process to the node provider process. I hope the OP will take some comfort in recognizing that the road map was also in the shadows for months(plus the 3 years in development). I’d expect this to come out of the shadows eventually.

If you look at the high traffic incident a week or so ago, there is some logic in slowing the node roll out until more of the wrinkles have been ironed out.

My understanding is that there has already been one incessant where node providers had had to go physically interact with machines to keep the network up. That gets harder when you up the data center count.

We are running a 20 year marathon here: Announcing the Internet Computer “Mainnet” and a 20-Year Roadmap | by Dominic Williams | The Internet Computer Review | Medium

10 Likes

This is such a well-written informative post.
I’m glad you are highlighting the importance of the hardware.

I would love to see the majority of data centers active outside of the U.S and Europe. I think that is going to be critical in the next few years.

I think Badlands means different things to different people. This makes it difficult to talk about. I certainly think a subnet run by community nodes would help with censorship resistance and poor market conditions. But that would not really address the problems you’ve identified.

6 Likes

This is a very well written and carefully considered post. Thank you for taking the time to express your concerns and understanding. I hope you are at the top of the list and become a node provider in the next round of applicants.

It doesn’t seem to me that there are too few node providers yet and I trust that Dfinity and ICA are carefully considering how and when to build out the capacity based on expected demand. Yet I think it is important for interested parties such as yourself to raise concerns and ask for transparency. I would like to see this happen as well.

7 Likes

I think the biggest question is:

When will the node onboarding capacity become exponential through some kind of automation, and how long will the linear (or worse) phase last?

3 Likes

Here’s another existential threat:

Is it even remotely possible or feasible to vet node providers as truly independent and not just belonging to subsidiaries of a larger parent company?

2 Likes

Something I believe Dominic had mentioned multiple times is that the network will add node providers with demand. If we aren’t filling up subnets to capacity (we aren’t currently, not even close) then why add more subnets? And the ICPunks event was more an issue of burst demand, not sustained demand.

I of course want to see more transparency in the node onboarding process and would love to see thousands then millions of nodes and subnets, but adding empty hardware without demand filling that hardware with use I don’t think makes sense. It’s not like Bitcoin or Ethereum where you keep adding nodes and harden your security but add no throughput

7 Likes

I believe that the future of the Matrix Internet Computer is with anonymous compute providers. The only way around this, in my mind, is Secure Multi-Party Computing enabled in a special subnet that provides developers (me) with private decentralized compute. Which would require a special subnet type. Unless ALL compute on the IC became SMPC, but I don’t see that as useful or necessary.

I think that there could be a simple solution in the Badlands concept where there are two types of nodes: verified and unverified. The verified nodes are those that go through the process as the current nodes have. The second, “Badlands” node, would be much less reliable (presumably by a number of nines, not less than 99.9% in time). The onboarding would be anonymous and function more similarly to running a BTC node. Likely compatible with RPi and/or a mobile app.

4 Likes

I appreciate your response. I feel you are looking at it from a few angles, like efficiency and security, but not seriously considering my proposed problem, which is rapid scalability.

Adding nodes will not add security but that’s not the problem I am surfacing. Even if the software allows, our ability to scale up the network is directly impacted by the ability to increase our fleet size. It actually makes a lot of sense to me to increase our fleet to a much bigger state than currently needed. It’s just forward-thinking, planning ahead to take over the world. We know the demand will be there, why not prepare for it? Especially in the beginning when we are still the only fully on-chain solution. There is no rational reason not to be much more aggressive with this. It’s actually risky optimizing for current traffic. When the demand hits, it’s usually a tsunami. And that’s how you fuck up that particular opportunity. When opportunity presents, it’s too late to start ordering, building, and sending out new nodes.

We have a few hundred nodes right now, they would fit in a medium room. We need much much more once we get a killer app that explodes. At least an order of magnitude more, but really, it’s two orders of magnitude that we should be ready for in the next years. Internet Computer will get a chance soon, and right now we are not ready for that.

6 Likes

I am all for ramping the node count up as fast as reasonable, I’m not sure any of us know what that rate should be though. I think the response from DFINITY might be that scaling up too fast without demand is one reason for the current pace.

Let’s hear from DFINITY on exactly why it’s taking as long as it’s taking and what can be done to speed it up and improve transparency.

But basically I agree, let’s get to millions of computers quickly and let’s have a plan for how to get there.

5 Likes

Perhaps we can emulate MPC with node shuffling and secure enclaves. Fully homomorphic encryption could also be an alternative to MPC in the future. All of these technologies could also work together to provide a solution.

I think the problem with MPC and FHE is that they aren’t practical yet. Enclaves and shuffling may provide an interim solution.

I agree that somehow hiding canisters from node operators could be an essential part of security

5 Likes

I would love greater transparency on node ramp up. Is there a certain ratio of messages/sec cs blocks/sec they are looking for or is it just time they need to determine node reward structure?

I agree there should be ramp up fast. Capacity will build up as more games, DeFi and governance tokens come online.

Somewhat related, think one of the biggest questions I have with IC is how many nodes do we need in total and per subnet do we need to be decentralized. Is there a goal for a specific Nakamoto coefficient for subnets? Is there any quarterly goals for total number of nodes? I assume this is part of some internal roadmap I hope gets made public.

I realize my questions are all mostly perception based and very subjective but I think it is a big part in the equation.

3 Likes

Thanks, @integral_wizard , for your detailed analysis. Essentially don’t start to dig a well when you see a fire. Plan for this in advance. SOL is able to provide NFT driven marketplace at scale and one of reasons for it’s spectacular visibility.

I believe that

(A) There must be enough capacity in the system to provide bursting capabilities to launch something like ICPunks event, 100x.

(B) Additionally there must be the ability to be “decentralized” so that a nation state cannot collude with like-minded nation states to shut IC down.

There are , I believe , the possible vectors to consider to achieve these two high level objectives.

V1. Who’s going to pay for this massive increase in capacity? Currently node providers get paid , essentially, in ICP to provide this service. Either there will need to be a massive inflationary regime (increase the ICPs in circulation) or somehow a large entity subsidizes the node providers for a period of time prior to cannisters buying (and using ) the ICPs to fund their activities. We MUST get away from the mindset of providing only enough nodes for steady state into how to account for explosive growth.

V2. Currently the node providers essentially get “slashed” for not maintaining the uniformity of compute prowess; through having uniform hardware. So far as the process of slashing is transparent enough, it may not be necessary to have uniform hardware. Indeed it may not be scalable for millions of nodes to be built on the same hardware. What happens to Moore’s law. Inevitably we will be moving to a scaled architecture with different compute characteristics on the time axis as different hardware will have different speeds.

V3. Currently node providers need to do hands-on work associated with tending to their nodes. While we may recognize that not everyone would want to be a node operator, a uniformity of approach on when to touch a node and how would be definitely something to consider. GRT, in a fairly different domain, does the same with their nodes. Look into guidance from different communities instead of trying to solve this vector de-novo.

V4. I have noticed that most of the nodes are in US, Western Europe & some in/near Singapore/ Australia. This is, of course, not decentralized. There is, probably, a notion of it’s " safer" in these countries than say more authoratian countries. But the safety aspect of this should be manifest in crypto graphic algorithms with no one be able to inspect the innards of IC in a node in any meaningful way even if they can do so, covertly or overtly. Then the notion of “safety” is no longer needed.

V5. It is vital for the community to know what the limitations are, in context of both economics and technology from preventing node providers from essentially blossoming into full blown forest from the saplings we are planting.

4 Likes

One limiting factor is cost: IC node providers get paid basically node infrastructure cost+markup (remitted in ICP, accounting for one aspect of ICP inflation). They get paid this no matter the work that their nodes do, to my understanding.

This is unlike e.g. Ethereum, where nodes share in total ETH transaction fees, and so decide for themselves to join the network up to the point that it makes economic sense for them, as ETH transaction fees split between more and more nodes.

So Dfinity could quite reasonably be throttling node onboarding to keep down node provider payouts, in turn limiting ICP inflation.

This said, completely agree with you in the main: we should definitely be prepared to scale as quickly as necessary to meet the tidal wave of demand that could come sooner than we may expect.

4 Likes

This is what the badland concept could solve. But accepting more non EU/US based nodes would be great starting point.

2 Likes

I guess the “stable payment” to nodes is just a starting point. I don’t see how it can scale as a fixed pay. Either the traffic remains small and it won’t make sense for IC to continue paying nodes or the traffic grows significantly and the nodes will start asking for a more variable pay. Overtime, i assume subnet may earn more based on traffic they handle. I wonder how the tokenomics will evolve at scale;

2 Likes

Paying depending on traffic in the region doesn’t make much sense from the node provider view. The cost is the same regardless of traffic. You still need to keep the machine running 24/7. There could be some global variable for provider payment calculation. Maybe the profits are smaller when we have more computational capacity globally than needed. This would be equal to all and a fair approach, but it still needs to be profitable.

2 Likes

We will see more clear about the future in 2022 ! …

Thank you for all the comments and the recognition of this thread. It gives me the motivation to keep going with it to provide more visibility into the current and future design.

Decentralization lies at the heart of web 3.0, the third wave of the Information Age. But let’s not forget the core problem of current systems - not centralization specifically, but being unsustainable. We are not sustainable collectively nor individually if we don’t include all the perspectives. In my opinion, the onboarding process is an overlooked deficiency.

We want to be scalable and secure, this is clear from the messages I have seen. We also want to be deterministically decentralized. Great. But I would also add that we want to grow organically (meaning without restrictions), but sustainably (building towards an anti-fragile ecosystem).

While the current node provider onboarding is in safe hands with a vision (I trust Dfinity) it is also behind the capabilities of that organization, meaning it prevents organic growth. Organic growth means unrestricted growth. As an example I have brought out before, I would happily host some node machines in a partnering data warehouse, but I’m in a queue. This prevents the Internet Computer from growing. I invite us to design a system without central restrictions (a bottleneck), but powerful supporting organizations.

I think we have enough to build the automatic node provider onboarding. Some observations so far

  1. we have the MDP of PrivIC (!), which allows collecting and controlling private information. It enables (in theory) to gather the phone number, identity and location information. This could be integrated with multiple global third-party KYC providers (or blockchains) and make it scalable.
  2. node provider requirements should be configurable and allow for being dynamic (depending on oracle data) and automatically updated.
  3. node provider restrictions/requirements should be managed autonomously by the NNS. These should not go through a central organization (like Dfinity)
  4. consider the payment for node providers to be dynamically taking into account the utilization of the network capabilities. That is, there has to be some minimum fixed income but also an element of change. This way we can grow organically, as everyone is free to join the network any time, knowing the expected return of investment based on current and projected traffic. The point being, if someone wants to join and increase the capabilities of the network, they should be allowed to join. But obviously, if these capabilities are not needed or used, then this should not make much financial sense. The open market should find the equilibrium itself, not even the protocol.
  5. If anything, we should over-provision our fleet mightily early on. Any killer app would crash the Internet Computer - a lost opportunity. That’s just a fact. We have a few hundred node machines, where IT companies have (tens of) thousands of nodes in their (inefficient, but enormous) cloud fleet. e.g. I am quite sure it is impossible for Distrikt to suddenly grow to 50M users. On the AWS servers, you can build this type of exponentially growing app.
  6. I’m not an expert on the VMs, but most likely we need a better way to separate the software from the hardware. I don’t think it’s scalable globally if all of the machines have to basically be identical. It should be fairly easy to buy and upgrade standard equipment locally/regionally (given enough performance guarantees that are surely possible to execute). That’s the only sustainable long-term way, afaik.

We have the blockchain design and even apps, but we don’t have the hardware for mass adoption. The fact that we don’t have a plan after Dfinity has been around for 5+y means that this is either overlooked, not communicated, or not focused on. Whatever the case, let’s improve on what we have. Unless proven otherwise, I would consider this a fundamental bottleneck of the IC protocol right now. (open-source) Software can relatively easily be replicated, community and hardware not so much. As an 8YearGang member, I would really appreciate more insight and thoughts on this.

10 Likes

“If we aren’t filling up subnets to capacity (we aren’t currently, not even close) then why add more subnets?”

Because in a universe where the adoption trends are exponential, by the time you realize that you’re going to need more capacity then it’s already too late?

(Fortunately, DFINITY has a very unique value proposition)

2 Likes