DRAFT Motion Proposal: New Hardware specification and remuneration for IC nodes

DRAFT PROPOSAL

Below is an example of a proposal we (the DFINITY R&D team) intend to submit to the NNS in a few days/weeks, depending on community feedback as well as questions and answers. We want to hear what you all think, including any wording changes we need to make


Summary

TLDR: As part of the work to further decentralize the infrastructure layer, we want to submit a new motion proposal to introduce a new type of hardware spec (and its corresponding remuneration) for nodes on the IC. We would like to get community feedback on this before proposing.

1. State of the world

Nodes are remunerated based on their location and node type. The Node Rewards table shows the rates per location and node type.

The current types are are listed here:

https://wiki.internetcomputer.org/wiki/Node_provider_hardware

2. What we are proposing

If you vote ACCEPT, you are agreeing on two things:

  1. IC should Introduce a new node type. The new type has requirements independent of vendors (except for the CPU).

You can see the details for the new proposed type here:

  • Dual Socket AMD EPYC 7313 Milan 16C/32T 3 Ghz, 32K/512K/128M
    • optionally 7343, 7373, 73F3
  • 16x 32GB RDIMM, 3200MT/s, Dual Rank
  • 5 x 6.4TB NVMe Mixed Mode
  • Dual Port 10G SFP or BASE-T
  • TPM 2.0
  1. DFINITY foundation will determine the expected cost of the new node type based on data from several independent vendors and propose reward rates based on an expected node lifetime of 4 years. DFINITY R&D will research various ways to construct the vendor-generic node type and propose rates for this node type.

This is a governance proposal, so if this vote passes, there will be subsequent NNS proposals to introduce the new node type and reward rates to the NNS.

3. Why we are doing proposing this

This new node type is being introduced for two reasons:

a. The current node specifications are vendor-specific which is an unnecessary centralisation a year after launch. Vendor-specific specs also makes adding future nodes more difficult as it is harder to buy machines with older hardware specs.

b. The current node types do not support VM memory encryption and attestation which will be needed in future features.

4. What we are asking the community

  • Read proposal
  • Ask questions
  • Give feedback
20 Likes

For any questions, @Luis (who runs the node provider efforts at DFINITY) will take any questions!

5 Likes

Typically such proposals ideally should come with options that are cost versus performance driven; otherwise there is no particular debate or discussion to be had.

The real work of figuring out what the cost of this hardware is (including hosting) vs what the potential reward would be for the different options SHOULD BE CLARIFIED UP FRONT.

Otherwise what , exactly, are we discussing on?

6 Likes

Totally agree. That’s why I wanted to explain the additional cost/rewards to performance ratio in the upcoming updates on this thread and subsequent motion proposal. The debate internally was: “How baked does this effort needs to be for community to be looped in?”

In this case, we opted for “Lets let folks know our intent. Post updates as we learn more. Iterate based on feedback. Dont get too far down the project without feedback.”

4 Likes

I am not exactly sure why there is not more traction on this thread from other potential node-providers. This will impact their bottom lines.

But i agree with you, @luis. I think some amount of background work is necessary for something of this magnitude. Otherwise we will come back in year scratching our heads as to why only one option (which is really not an option within options) was covered.

For example, why limit ourselves to Milan? Why not Milan-x? Most of our current workloads have likely low L3 cache miss rates and high L3 cache coherency misses. But is this going to change in the future? I believe so because we should be gravitating towards exploiting the data currently in possession instead of using it just once. i.e. would a 7573X be an option?

3 Likes

For example, why limit ourselves to Milan? Why not Milan-x?

The Milan 7373 that we allow optionally is an X model. The reason we chose these models is trivial: We want the node type 3 spec to be as near as possible to the type1 in order to use both node types in the same subnet for a while. If this wouldn’t be possible we would need to onboard many new NPs before the first subnet with a sufficient decentralisation can be created. That could mean these nodes would cause costs before the can provide value to the network.

would a 7573X be an option?

The 7573X would cost more than the 7373X that we would like to allow. Using a X series would atm increase the node cost by about 30-50%. The reason why we chose to stay with the 7373 is that it has the same core configuration like all other 73’ models.
It’s already a lot of work and costs to build the tests and environments to get the confidence that the system is able to handle this additional diversity in the node hardware. That’s why we decided to only change as much as necessary to ensure that there is a bigger choice of vendors (decentralisation) and thereby a safer supply chain.

I am not exactly sure why there is not more traction on this thread from other potential node-providers. This will impact their bottom lines.

A reason could be that we (DFINITY) didn’t yet talk about a network growth strategy in general and in public yet. We are working on preparing community talks about all topics that are connected to the growth of the network like: NNS driven remuneration finding, node operation DAOs and other platform decentralisation related things. As some of you already know we are still working on the platform decentralisation roadmap from end of last year.

7 Likes

DFINITY is working towards the new hardware specification. To ensure these machines work at least as well as IC nodes before asking the community to vote on the HW specification, DFINITY wants to run a battery of micro and macro benchmarks with synthetic and real-world workload.

More precisely, in addition to running single-machine benchmarks, we want to evaluate these machines under as realistic circumstances as possible. Therefore, we propose to add two such machines to mainnet subnets. We will start with two low-traffic subnets, namely lhg73 and shefu, where we’ll add one such node respectively.

Once they reside on these subnets, we want to run benchmarking experiments against them

to assess the behavior under workloads exercising canister creation, query and update call processing with more and less memory-intensive workloads

We will compare the metrics of the new hardware and observe if we can spot any anomalies compared to the first generation HW.

When these experiments have been successful, we’ll add the nodes to the following high-traffic subnets with at least one DFINITY node: eq6en, mpubz.

6 Likes

For visibility: Gary McElroy (@garym) is a Senior Engineer at DFINITY working on hardware.

1 Like

When is the new onboarding process starting? When should people expect to be able to submit proposals to the NNS to run nodes from data centers?

Hi @Sormarler m

Edit: I crossed out below because I was wrong

I need to verify, but my understanding is that aspiring NPs can currently use the wiki instructions but the hardware spec may block certain people that cannot get access to the machines. An aspiring node provider can follow the instructions to onboard themselves.

The work to make this much more user-friendly (e.g. by using the NNS Frontend dapp) is likely coming after SNS and hardware spec.~~

@diegop I am little confused here.
What I understood initially is that this form (Node Provider Interest) is closed, and was waiting for DRAFT Motion Proposal: New Hardware specification and remuneration for IC nodes - #10 by diegop proposal to reach a conclusion before new nodes can be onboarded to the network.

Are you suggesting any one can follow instructions in the wiki will be able to become NP and receive rewards ?

1 Like

Same. I am a little confused as well. I was under the assumption the whole process stopped because a new method was in development.

@ritvick @Sormarler i am not surprised I confused you all since it turns out I was wrong and team corrected me when I checked in with them. The new process is still under development and wiki instructions are still under development (they could work, but have some rough areas the team is still working on) so they are not the experience the IC should have.

That being said…the hardware stuff (which is the main theme of this thread) i believe is a necessary condition so @garym will post an update on hardware tests.

Apologies for confusion I caused. This is not my area. I should have verified earlier.

We have executed the validation plan described previously on two ASUS machines. These meet the generic Gen 2 hardware specifications as specified earlier in this thread. Some abbreviated specs:

  • 2x AMD EPYC 7313 (3,00 GHz, 16-Core, 128 MB)
  • 512 GB (16x 32GB) ECC Reg ATP DDR4 3200 RAM
  • 32TB (5x 6,4TB) NVMe Kioxia SSD 3D-NAND TLC U.3 (Kioxia CM6-V)
  • Swiss price: 21’595.75 CHF

Validation results:

  • Low Level

    • Stress tests (using stress-ng) - increase confidence in the hardware configuration and these specific machine instances - Passed :white_check_mark:
    • System benchmarks (using sysbench) - Gauge performance against known Gen 1 node performance - Passed :white_check_mark:
      • About 2x performance increase for cpu and memory.
      • Disk performance ranges from equally good to better for the majority of tests
    • SEV-SNP capability - Verified BIOS and kernel support working in tandem - Pending :construction:
  • High Level

    • Method: deploy machines into subnets and ensure subnet metrics do not deviate negatively by a meaningful threshold
    • Low usage subnet deployment. Scalability benchmarks (system baseline and large memory)
      • All metrics nominal - Passed :white_check_mark:
    • High usage subnet deployment
      • All metrics nominal, except
        • Individual node checkpointing performance discrepancy of <3-6%.
          This has no impact on subnet performance, but we’re still keeping an eye on it.
      • Passed :white_check_mark:

We have updated the node provider hardware wiki to include this ASUS server configuration. An example ASUS quote and bill of materials (BOM) is available for interested community members.

6 Likes

What’s the plan now?

We plan on continuing validation of new Gen 2 hardware configurations and publishing the results. Many factors influence how we proceed, e.g., community input on hardware configurations/manufacturers, price, availability.

What We Are Asking The Community

Please comment on and prioritize next hardware choices (abbreviated specs):

  • Dell PowerEdge
    • 2x AMD EPYC 7343 3.2GHz, 16C/32T, 128M Cache (190W) DDR4- 3200
    • 16x 32GB RDIMM, 3200MT/s, Dual Rank 16Gb (BASE x8)
    • 5x 6.4TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 with carrier
    • Swiss price: 27’159.09 CHF
    • USA price: $26,460.32 USD
  • HPE Proliant
    • 2x AMD EPYC 7343 3.2GHz 16-core 190W Processor for HPE
    • 16x HPE 32GB Dual Rank x4 DDR4-3200 CAS-22-22-22 Registered Smart Memory Kit
    • 5x HPE 6.4TB NVMe Gen4 Mainstream Performance Mixed Use SFF BC U.3 Static Multi Vendor SSD
    • Swiss price: 27’031.83 CHF
  • Lenovo
    • 2 x ThinkSystem AMD EPYC 7343 16C 190W 3.2GHz Processor
    • 16 x ThinkSystem 32GB TruDDR4 3200MHz (2Rx4 1.2V) RDIMM-A
    • 5 x ThinkSystem U.3 Kioxia CM6-V 6.4TB Mainstream NVMe PCIe4.0 x4 Hot Swap SSD
    • Swiss price: 30’534.27 CHF
    • USA price: $28,525.54 USD

Note: Prices are provided as rough examples and don’t include tax. USA prices are provided for comparison - the hardware will be validated in Switzerland. Example quotes and BOM’s for these hardware configurations are available on request.

Having instances of identical hardware as node providers has an additional benefit: if node providers face problems, the DFINITY engineering team can reproduce and debug independently on an identical environment. This must be done without access to node provider owned machines.

4 Likes

Are variations like using Kioxia disks in Dell servers acceptable?
or only the specific combinations that have been validated can be used?

Are variations like using Kioxia disks in Dell servers acceptable?

Yes. Some vendors may not provide components like the configurations above.
That said, performance characteristics of alternatives should be equivalent.

I have not seen any connectivity requirements.
What are the expectations per node in this respect? and how is that compensated?

For instance, if a rack has a 10Gbps dedicated connection shared by N nodes… what are the requirements and how are rewards calculated in that case?

The aim is for 10Gb connectivity per the second requirement here: Node Provider Onboarding - Internet Computer Wiki

Regarding node rewards: still under development.

EDIT: I’ll check on per-node connectivity expectations

How and Where do we request this ?