DRAFT Motion Proposal: New Hardware specification and remuneration for IC nodes

DM’ing me here works!

Evolving spreadsheet of BOM listings: IC Gen2 HW BOMs - Google Sheets

What are the expectations per node in this respect?

Consulted with the team. We’re aiming for a 10Gb connectivity per 1 or 2 racks.
A more nuanced answer including info about rewards is coming.

1 Like

Thanks! This makes sense.

More information about rewards would be useful, in particular given that DC costs are likely to keep rising due to energy costs. Also if the reward multiplier goes down to 2.5/1.75 (as I read somewhere else) the economic equation changes quite a bit.

We have run the validation procedure on a machine from an additional vendor: Gigabyte.

Gigabyte test machine abbreviated specs:

  • Dual AMD EPYC 7413 2.64 GHz, 24 Core, 180W
  • 512GB (16x32GB) 3200MHz DDR4 RDIMM
  • 10x7.68TB NVMe SSD (exceeds minimum specs)

Validation Results:

  • Stress tests - Passed :white_check_mark:
  • System benchmarks - Passed :white_check_mark:
    • As with the ASUS, about a 2x performance increase for cpu and memory.
    • Disk performance similar or better to previous gen hardware
  • SEV-SNP capability - Verified BIOS and kernel support working in tandem - Passed :white_check_mark:

We have updated the node provider hardware wiki to include this configuration.

4 Likes

Does this mean that it is valid to use 7.68TB drives ( DWPD < 3 ) in gen2?

Adding this additional flexibility would be really useful!

Does this mean that it is valid to use 7.68TB drives ( DWPD < 3 ) in gen2?

Yes No. See update: DRAFT Motion Proposal: New Hardware specification and remuneration for IC nodes - #31 by garym

The intent of the 5x 6.4TB recommendation was to cover minimum storage requirements while balancing cost and reliability. Reducing the number of drives reduces probability of failure of any one drive (the disk layout does not utilize redundancy).

More ‘functional’ guidance and specifications for SSD’s and RAM are in the works.

2 Likes

We have run the validation procedure on a machine from an additional vendor: Supermicro.

Supermicro test machine abbreviated specs:

  • Dual AMD EPYC 7543 32-Core Processor 2.8 GHz, 225W
  • 1024GB (16x64GB) 3200MHz DDR4 RDIMM
  • 10x7.68TB NVMe SSD

Note that the configuration of each of these components exceeds the Gen2 specs.

Validation Results:

  • Stress tests - Passed :white_check_mark:
  • System benchmarks - Passed :white_check_mark:
    • As with other Gen2 configurations, about a 2x performance increase for cpu and memory.
    • Disk performance similar or better to previous gen hardware.
  • SEV-SNP capability - Verified BIOS and kernel support working in tandem - Passed :white_check_mark:

The sysbench tool used in previous validation runs gave us some trouble and odd numbers. We switched to fio which provided more stable and predictable performance. The validation procedure is being updated and will be run again on previous machines to maintain a fair comparison.

We have updated the node provider hardware wiki to include this configuration.

2 Likes

When this validated hardware is expected to get final approval ( if there is a thing around this ).
I am asking this because when is it safe to place an order for hardware like this? I have checked with vendors like dell they have a lead time of 6-8 months ( at least here in Canada).

1 Like

Is there a timeline by which we can define the minimum storage and ram requirements for gen 2 hardware?

1 Like

DFINITY strongly recommends using hardware that meets the exact specification for hardware components.

To simplify:

  • System must be one of {Dell, Supermicro[0], Gigabyte[0], ASUS[1], HPE[1]}
  • Processors must be Dual AMD EPYC Milan. Minimum model number: 7313 (higher end models are OK)
  • RAM must be 16x 32GB 3200 MT/s. All from the same manufacturer. Don’t mix and match.
  • SSD’s must be 5x 6.4TB mixed mode NVMe (DWPD >=3). All from the same manufacturer. Don’t mix and match.
  • Must have dual port 10G SFP or BASE-T
  • Must have TPM 2.0

Choice of manufacturer of SSD or RAM can be decided by the NP. These choices may be constrained by the vendor.

Deviate from this specification at your own risk. DFINITY does not provide hardware support or troubleshooting. Deviating from the spec will cause issues during installation of node software and risks losing NP rewards.

You may have noticed the previous validated configurations for Supermicro and Gigabyte had deviated from the specification by either doubling the size of each RAM stick or using 7.68TB SSD’s (DWPD ~=1). We don’t recommend doing this. Sorry if that caused confusion.

[0] We will re-run validation on the Gigabyte and Supermicro with components which meet the exact specification.

[1] SEV-SNP validation is still pending for the ASUS configuration. Complete validation is still pending for HPE.

1 Like

@garym Is it worth adding minimum read and write sequential speeds, random read and write IOPS and latency requirements for SSDs?

1 Like

We have run the validation procedure on a machine from an additional vendor: Dell

Dell test machine abbreviated specs:

  • Dual AMD EPYC 7343
  • 16x 32GB RDIMMs, 3200 MT/s Dual Rank
  • 5x 6.4TB Mixed Mode NVMe SSDs

Validation Results:

  • Stress tests - Passed :white_check_mark:
  • System benchmarks - Passed :white_check_mark:
  • SEV-SNP capability - Verified BIOS and kernel support working in tandem - Passed :white_check_mark:

We have updated the node provider hardware wiki to include this configuration.

1 Like

@ritvick Great question - I think it is worth it to clarify what’s acceptable. We’re resolving discussions internally before posting such guidance.

1 Like

Update on ASUS hardware: SEV-SNP validation has succeeded! The ASUS configuration is now fully validated to run on the IC.

We had some interaction with our suppliers and ASUS. The issues we saw were solved with the latest version of the BIOS, released recently.

Up next: validating HPE, and re-validating Gigabyte and Supermicro.

4 Likes

We wanted to become a node provider, so there were asking vendors for information about hardware. They said there is no way to do RAID because it is SSD storage, will this be a problem?

They said there is no way to do RAID because it is SSD storage, will this be a problem?

No, not a problem. Hardware RAID should not be attempted. The IC-OS installer will verify there are 5x independent 6.4TB NVMe SSD’s and prepare them appropriately. IC-OS uses a ‘striped’ LVM volume across all the disks (technically a software RAID 0).

What about redundancy? Replica nodes currently provide it at a higher level than disk redundancy.

3 Likes

I have a question regarding the 3DWPD (3D Write Per Day) drives with varying capacities. Specifically, I wanted to know if it is necessary to use a 6.4TB capacity SSD, or if other capacities are acceptable.

I understand that it may not always be possible to obtain an exact 6.4TB capacity SSD, as they may not be readily available in stock. Hence, I am wondering if alternative options such as 10x3.2TB or 5x7.6TB drives from reputable brands like Micron or Kioxia can be used.

As Gary stated before “Deviate from this specification at your own risk. DFINITY does not provide hardware support or troubleshooting. Deviating from the spec will cause issues during installation of node software and risks losing NP rewards.”

And we’ve only validated hardware for the exact specifications discussed above.

That being said, the current requirement is not that every SSD must be exactly 6.4TB, but rather that each SSD must be at least 6.4TB, and the total storage capacity must be at least 32TB. So 5x7.6TB drives should pass the hardware requirements, but we encourage everyone to use the exact specs or risk causing issues during the installation of node software and risks losing NP rewards.

@garym Hi. If NP configures the machine model by themselves, can they conduct testing by themselves? Will the Dfinity team assist in testing? Thanks.

If NP configures the machine model by themselves, can they conduct testing by themselves?

For performance validation, yes! The validation scripts are available on the public GitHub IC project. It’s on our backlog to have it export in CSV for easier comparison but you can run the scripts right now. DM me once you have the results and we can compare numbers.
Our performance validation results should be made public but they’re unorganized at the moment and we have higher priority items in our backlog.

1 Like