New node provider - specifics and philosophy

Dear Community,

This is my first post and I hope I have not mistaken the category.

I am trying to understand the “Node Provider Machine Hardware Guide”, more specifically, the following:

  • What exactly a “Gen-2.3” generic node could be? Is there a list of already tested models? Or any server covering the processor and RAM requirements will be a fit?
  • Is there a complete list with models of all other generations?

Thanks!

2 Likes

Hello @vmetodiev ! Putting our DM here for great visibility: we don’t publish exact server models on purpose, not to create a reliance on a single or 2-3 hardware models. You are welcome to ask for the server models that are our node providers are using. In fact, I believe some of them shared this info in our Matrix Element channel https://app.element.io/#/room/#ic-node-providers:matrix.org

Best regards,
Alexander

2 Likes

Thank you, @alexu!
Now, another relevant question related to the above.

Is there a tool (script) that could be used to validate that certain server configuration is compatible with IC-OS?

Best regards,
Varban

We definitely had scripts that were used to test the low-level hardware parameters. I think, we used ‘sysbench’ and ‘stress-ng’ for testing hardware components, such as CPU, memory, and disk. In the nutshell, we want to make sure that if your hardware meets the spec in the Wiki, it is fast enough to be on par with the rest of the nodes in the subnet, otherwise the slowest server will slow down the rest of the subnet.

1 Like

Good, then comes the question with custom assembled servers, or other models from the well-known and established brands, that match all the requirements.

For example, let’s say that I get a pair of 1U chassis with duplicate power supply and all components to meet the “generic” requirements for compute, storage and network resource. After installing IC-OS successfully (which is a guarantee that my hardware setup matches the requirements), I will bring them to a certified data centre and do the collocation according to all other ISP and DC related requirements.

Now, is there a chance that the community may reject me for whatever, non-technical related reason? And I will end up with maybe more than $10k spent on new hardware and a monthly contract for data center collocation.

Could this be proven, tackled (and even prevented) on a proposal level?

1 Like

@vmetodiev, there is also another aspect to consider when it comes to bringing self-assembled machines into DC: some DCs won’t accept them, unless they have a warranty from a manufacturer. This is due to the fact that DC use very expensive way to put down the fire, i.e. filling room with gas. If your self-built server’s power supply starts producing smoke, the smoke alarm will trigger the gas release in this room, and it will be quite expensive to refill. Keep this in mind when you evaluate DCs for hosting!

which is a guarantee that my hardware setup matches the requirements

Just to emphrasize that successful IC-OS installation isn’t a guarantee that your hardware setup matches the requirements: we do have basic checks during the installation, but they aren’t the guarantee the node will perform as expected.

is there a chance that the community may reject me for whatever, non-technical related reason?

Yes, there is such a chance: for example, community might reject your proposal in case you do not bring in sufficient evidence of your identity, or there is a plausible reason you aren’t being truthful about server location or the documentation provided.

Please keep in mind we currently do not accept any new nodes in the network, as mentioned in the Wiki. If/when we do open up the network to new node registrations, it will likely an updated hardware spec as Gen2 hardware becomes more difficult to procure.

1 Like

@alexu

Thanks for mentioning the point regarding hosting custom (assembled) servers in datacenters, it is indeed quite relevant!

Now, after going through some of the wiki materials, I would be glad to know what DFINITY thinks about the following future optimisations (guess most of them are already planned or even in progress):

  • Regarding the IC-OS installation, it would be good to add the performance validation step. Either on OS level (some script, application), or UEFI level (application).

  • All servers now seem to be x86 based. The IC runs WebAssembly, which is more than enough as abstraction to be architecturally independent. There are good ARM processor, the new RISC-V ones are also catching up.

  • Resource categorisation - wouldn’t it be good if there are categories, part of the node provider requirements and renumeration accordingly, for “compute” (CPU, RAM, architectural diversity), “storage” (in terms of volume, speed and topology (directly attached NVMe to the PCIe root complex or NVMe connected via PCIe switch)) and “network” (incl. latency, throughput, redundancy)". Let’s not forget the GPUs, FPGAs, custom (AI) accelerators as well.

  • More about the networking aspect:

  1. The Node Provider Networking Guide suggests avoiding LACP. Well, it should be more specific why… And provide requirements and renumeration based on OSI L2 and OSI L3 redundancy, link aggregation, BGP peering, etc. And after all requirements are clear, only then define the minimum number of IPv4 and IPv6 addresses (IMHO).
  2. Domains - should all node providers have a domain? Why not use subdomains from the DFINITY domain?
  • Business model - should I participate as a private individual or as a company?

  • Infrastructure growth - provide a good answer to the question "Why would I want to spend my money on becoming a node provider instead of investing the allocated budget in buying ICPs (and thus avoid all headaches related to server build-ups, setups, collocation and maintenance). How to apply and wait in the queue for the next “season” of infrastructural extension?

I believe that ICP is a promising near future alternative to the monopolised, techno-feudalistic market. And being a different “crypto”, backed by idea and infrastructure (instead of speculation), there should be fully transparent procedures for all participant who want to contribute to the better future.

1 Like

Dear Varban,

While I don’t have answers to all of your questions, I’ll try to briefly answer the ones that I can. If other community members want to chip in and provide their PoV, it would be great!

  • Regarding the IC-OS installation, it would be good to add the performance validation step. Either on OS level (some script, application), or UEFI level (application).

Yes, probably. I can relay this info to our Node team to see whether this is something they can tackle in the future!

  • All servers now seem to be x86 based. The IC runs WebAssembly, which is more than enough as abstraction to be architecturally independent. There are good ARM processor, the new RISC-V ones are also catching up.

ICP security model is tied to AMD SEV-SNP, so it is unlikely we are going to step away from that architecture (and CPU maker) in the near to mid-term future.

  • Resource categorisation - wouldn’t it be good if there are categories, part of the node provider requirements and renumeration accordingly, for “compute” (CPU, RAM, architectural diversity), “storage” (in terms of volume, speed and topology (directly attached NVMe to the PCIe root complex or NVMe connected via PCIe switch)) and “network” (incl. latency, throughput, redundancy)". Let’s not forget the GPUs, FPGAs, custom (AI) accelerators as well.

There were proposals to do exactly what you are mentioning here. Feel free to browse/search the forum on the state of these discussions, I don’t have the exact info.

  1. The Node Provider Networking Guide suggests avoiding LACP. Well, it should be more specific why… And provide requirements and renumeration based on OSI L2 and OSI L3 redundancy, link aggregation, BGP peering, etc. And after all requirements are clear, only then define the minimum number of IPv4 and IPv6 addresses (IMHO).

The ICP network is designed to withstand the outage of certain nodes in each subnet. Adding more redundancy to each node means higher maintenance costs for you as a Node Provider without obvious advantages to the ICP subnet that your nodes could be part of. LACP also skews the MAC addresses of the nodes by replacing them with a virtual one, and this can confuse our installer which uses MAC addresses of the networking cards.

  1. Domains - should all node providers have a domain? Why not use subdomains from the DFINITY domain?

DNS is another form of control. Using DFINITY-owned DNS records doesn’t help with decentralization.

  • Business model - should I participate as a private individual or as a company?

This depends on your jurisdiction and tax laws in there. For some of our node providers, it makes sense to register a company. Others - remain individuals or register themselves as ‘sole trader’.

  • Infrastructure growth - provide a good answer to the question "Why would I want to spend my money on becoming a node provider instead of investing the allocated budget in buying ICPs (and thus avoid all headaches related to server build-ups, setups, collocation and maintenance). How to apply and wait in the queue for the next “season” of infrastructural extension?

There are a few reasons:

  1. Node provider rewards are fixed in XDR, so there is no issue with token valuations going up or down. You can count on your rewards to be sent to you on time, in predictable amounts, for a predictable number of months/years.
  2. Not everyone can be a node provider: this requires funding, as well as skills to maintain a stable, working server infrastructure.

Best,
Alexander

3 Likes

Dear Alexander,

Thank you for your detailed responses to all of my points! I changed the thread title accordingly (hope it now sounds more relevant to the discussion).

Best regards,
Varban

Dear Varban (@vmetodiev ),

Here is what we were able to find in our repo ic/ic-os/dev-tools/hw_validation at master · dfinity/ic · GitHub regarding hardware testing. There is also ic/ic-os/dev-tools/bare_metal_deployment/benchmark_runner.sh at master · dfinity/ic · GitHub that creates additional IO load for the benchmarks. While it is not exactly pass/fail, it should give you an indication of the workloads the nodes should be able to handle.

2 Likes

Dear (@alexu) Alexander,

Sorry for the delayed response and thanks for the links!

I had a brief look at the repository. Hope I will have some time for experiments during the weekend. I got the installer working on KVM virtual machine, will try to figure out how to make a running “sandbox” setup (installation) as well, so that I can play with the IC-OS. I’m curious…

2 Likes