DRAFT Motion Proposal: New Hardware specification and remuneration for IC nodes

Currently the 5x 6.4TB SSD requirement should be considered a requirement. Please stick with 3 DWPD SSD’s. There are some upcoming features which will increase disk usage. This may cause 1 DWPD SSD’s (7.6TB, e.g.) to have durability failures before the warranty period is over.
If there are issues with supply please let us know.

1 Like

Thank you very much. I have another question.

In the validated configuration, specific chassis and model configurations have already been specified. Can I choose similar chassis or configurations? For example, can I switch from ASUS RA2112 to a similar RS720A chassis? For Kioxia’s NVMe SSD, is it only necessary to meet the requirements of 6.4 TB, 3 DWPD and U.3, without being exact to the specific model of CM6-V, for example, Kaixia CM7?

Does passing the performance test mean that the machine can run IC programs and operate normally on the subnet? We want to chose ASUS. And why doesn’t ASUS need a TPM module? I can’t find the 1U AMD dual-CPU RA2112-ASEP server. Can I use another common ASUS chassis as a substitute?

Can I choose similar chassis or configurations?

Yes! What’s important are the requirements specified above.
DFINITY has validated a few configurations but we cannot cover all manufacturers and all components. It’s expected that new configurations will be used going forward.

Does passing the performance test mean that the machine can run IC programs and operate normally on the subnet?

Yes. The validation suite is designed to vet the hardware appropriately. We’re working on making this easier to use and to contribute results to one place.

And why doesn’t ASUS need a TPM module?

It does require a TPM 2.0 module. Looks like I didn’t list that on the ASUS configuration in the wiki. Will fix. Thanks for bringing it up!

I can’t find the 1U AMD dual-CPU RA2112-ASEP server . Can I use another common ASUS chassis as a substitute?

Yes. As long as the requirements listed above are met, you should be good.

2 Likes

Thank you very much for your patient guidance. :+1: :clap:

1 Like

Hi. Help.

Does anyone try NVMe with U.2? Does its IO read and write speed meet the requirements?

ASUS RS720A-E11 servers integrate PFR FPGA as the platform Root-of-Trust solution for firmware resiliency to prevent from hackers from gaining access to infrastructure. Do we still need to configure a separate TPM? Or use the built-in FPGA?

Hello Shuai!

Does anyone try NVMe with U.2? Does its IO read and write speed meet the requirements?

Yes, U.2 meets the performance requirements. The Dell that was validated by DFINITY used U.2 NVMe drives.

Do we still need to configure a separate TPM?

No. There’s a requirement for the TPM 2.0 in the generic hardware specs for future use by IC-OS, but it’s not used yet. No extra configuration is currently necessary.

1 Like

Can Node Providers use AMD Milan models with 24C or 32C?

Furthermore, do the CPUs have a to have a base clock speed of 3Ghz?

Can Node Providers use AMD Milan models with 24C or 32C?

Yes. Note that choosing a higher CPU model can increase power requirements on the power supplies. Please consider this and adjust appropriately. See Wikichip.com for reference.

Furthermore, do the CPUs have a to have a base clock speed of 3Ghz?

No. They just need to be 2x EPYC Milan CPU’s. I have clarified the wiki to reflect this.

We have finalized validation on the Gigabyte for exact Gen2 specs :tada:

Gigabyte test machine abbreviated specs:

  • AMD EPYC 7313 (3,00 GHz, 16-Core, 128 MB)
  • 512 GB (16x 32GB) ECC Reg DDR4 3200 RAM
  • 5x 6.4 TB NVMe TLC SSD, PCIe 4.0 x4, U.3 2.5", 3 DWPD

Validation Results:

  • Stress tests - Passed :white_check_mark:
  • System benchmarks - Passed :white_check_mark:
  • SEV-SNP capability - Verified BIOS and kernel support working in tandem - Passed :white_check_mark:

We have updated the node provider hardware wiki to include this configuration.

4 Likes

Hello,
We are new to Dfinity and are considering to run a node. When looking at the specs, we got a few questions:

  1. What is the motivation to run a 2xCPU instead of 1xCPU with a comparable amount of CPU cores and possibly higher clock speeds? Is due to more amount of RAM slots the CPUs combined can handle?
  2. Has anyone experienced running a 1xCPU setup instead of dual CPU setup?
  3. When it comes to SSD, you require 5x6.4TB of storage, but would 3x12.8 storage work, too? Was wondering if the target is to achieve a higher throughput with the 5x setup or not.

Thank you for any comments/answers and looking forward to dig deeper to the setup!

What is the motivation to run a 2xCPU instead of 1xCPU with a comparable amount of CPU cores and possibly higher clock speeds?

I wasn’t there for the original design of specs, but dual CPU systems can use more memory lanes and therefore have higher memory throughput.

Has anyone experienced running a 1xCPU setup instead of dual CPU setup?

No. SetupOS will fail if it doesn’t detect 2x (EPYC Milan) CPU’s

When it comes to SSD, you require 5x6.4TB of storage, but would 3x12.8 storage work, too? Was wondering if the target is to achieve a higher throughput with the 5x setup or not.

This can be a complex discussion :slight_smile:
The goal was a balance of speed, reliability, and cost. The nodes use LVM (effectively software RAID 0) striped over each SSD so writes are distributed. Larger SSD’s might give higher throughput per SSD, but this striping allows for parallel writing to each SSD so I/O to the 5x 6.4TB SSD’s is very fast.
Larger SSD’s are usually disproportionately more expensive.

Node specs (Gen3?) may evolve in the long run. For now the specifications above should be followed rigidly.

2 Likes

Thanks for the explanation. Will then stick with the recommended setup.

1 Like

Is TPM 2.0 necessary? What is the main function of this?

Is TPM 2.0 necessary? What is the main function of this?

Yes. It will support upcoming security features.

The latest Global R&D recapped the Gen 2 hardware validation status. Here is a summary of the validated hardware:

See the wiki Node Provider Machine Hardware Guide for details on the validated machine configurations

3 Likes

Hello everyone what is currently the most profitable equipment to buy in terms of reliability and price? If you have purchased equipment recently, please share your experience.

@andrewbattat Hello, you’ve mentioned the server models for Dell. Could you please inform me about the server models you used for Supermicro and Asus?

The specs for all the validated servers can be found on the Node Provider Hardware Guide wiki page