AMD SEV Virtual Machine Support

Hi folks, I wanted to give a quick update about SEV-SNP. There has been a lot of interest from the community around the its use and its potential to provide integrity protection and confidentiality to the replica nodes.

At DFINITY, we’ve been working for several months with the goal of rolling out SEV replicas to a subnet, and in the December 6th Global R&D, we demoed some of that work: SEV-SNP enabled replicas spinning up and joining a testnet after mutually attesting each other.

However, there is still some work to do specifically around replica upgrades, network integration and enabling disk and memory encryption.

For the time being, we are going to prioritize leveraging SEV-SNP to enhance the security of the boundary nodes instead of the replicas. As you may be aware, there is a roadmap to decentralize the boundary nodes that is being executed. It includes splitting the boundary nodes into an API boundary node and an HTTP Gateway.

  • API boundary nodes: Provide an endpoint that handles API canister calls by routing them to the correct subnet and replica node, and provides caching and rate-limiting to protect the IC. These nodes will be run by NNS-approved node providers and managed by the NNS.
  • HTTP gateways: Provide endpoints that terminate TLS and translate user HTTP requests to API canister calls. These nodes can be run by anyone.

SEV-SNP can be used to improve the security of these new components:

The first phase will be to use SEV-SNP for the HTTP gateways. Users would be able to independently verify that they are querying a known version of the gateway and be confident that the gateway is not intercepting or tampering with the traffic flowing through it. This is especially beneficial now that the service worker has been removed and we rely on the gateway for certifying HTTP responses.

The second phase will be to use SEV for the API boundary nodes. Today, a lot of the metrics that power the dashboard are emitted by the boundary nodes. API boundary nodes running on SEV-SNP increases the confidence in the accuracy of the scraped metrics. Moreover, it will ensure that the content of the API calls proxied by the API boundary nodes cannot be read by Node Providers.

Because these components are stateless, it removes a level of complexity and gives an opportunity to vet the technology before using it on replicas.

13 Likes

What is the expected ETA for ohase 1 and 2?

Thanks for the update @raymondk!

Will Dfinity have its own SEV-SNP enabled servers to run the HTTP gateways? What if instead the replicas that are not part of any subnet are used?

I’m also curious about your findings regarding TEEs in general. What can a node provider do when the HTTP Gateway (or API BN or replica) code is run in the TEE?

2 Likes

We don’t have ETAs yet - The boundary node team is focused on the actual decentralization of the boundary nodes which we expect to have completed by the end of Q2.

In parallel, we are evaluating ways to leverage SEV for the HTTP Gateways that could be easily reused by anyone willing/wanting to host an HTTP Gateway including running on bare metal, in confidential containers or on cloud provider confidential computing offerings.

We’ll publish a more detailed roadmap with ETAs when we have a solid plan. As always ideas and input from the community are welcome!

2 Likes

Hey @massimoalbarello - the HTTP Gateways will technically not be part of the IC and anyone should be able to run them.

DFINITY is evaluating ways to leverage SEV for the HTTP Gateways that could be easily reused by anyone willing/wanting to host an HTTP Gateway including running on bare metal, in confidential containers or on cloud provider confidential computing offerings. I expect that when the time comes, DFINITY will host its gateway at least partially on its own SEV-SNP enabled servers.

For API Boundary nodes, those will be under node provider control and the idea is that an NPs machine can be used as either a replica or an API boundary node. Gen2 machines are speced with SEV-SNP enabled processors and the expectations is that those will be used.

The promise of SEV-SNP is that the hardware provider is not able to tamper with the VM image or read the memory.

In theory that is amazing but how far away are they in practice?

If it was actually the case that they are tamperproof and can attest the image they are running, why would the consensus algorithm need to tolerate byzantine failures instead of only fail-stop failures?

2 Likes

It’s a question that comes up regularly and the idea is to layer the different levels of protection and not rely on one single thing.

2 Likes

@massimoalbarello this is an interesting thread on your question: Reddit - Dive into anything

1 Like

What’s the latest update on SEV-SNP for boundary and replica nodes?

4 Likes

Hi!

A few updates from the Boundary Node team regarding SEV-SNP - over the last few months we’ve done a lot of exploratory work and are now targeting the HTTP Gateways as our first SEV-SNP workload.

The first fleet of HTTP Gateways will be Dfinity owned with that deployment serving as a reference for others to follow.

Once deployed, several prominent IC domains will be moved to these gateways with users being able to reproduce the entire supply-chain, i.e, every node in the following tree is reproducible and can he hashed in a consistent manner, producing a measurement that can be used for verification of the workload.

Once you have a measurement, you can perform remote attestation against an HTTP Gateway to get a live measurement, which you can compare with the offline-produced measurement to ensure the workload is the expected one.

Follow-up work for this will include the NNS-managed API Boundary Nodes being transitioned to become SEV-SNP workloads as well, with replicas being targeted at a later time.

Happy to answer any follow-up questions!

14 Likes

Thanks for the info!

Could you give us some rough/estimated timelines for transitioning each of these?

1 Like

:wave: @lastmjs,

We don’t have a reliable timeline for this yet. Our current focus is on the decentralization of the boundary nodes which we’re aiming to finish in the next couple of months.
We definitely have a better handle on SEV after this last round of experiments and we’re working on a tentative roadmap for the roll out.

Out of curiosity, what are you expecting to get out of it?

I want SEV across the entire protocol to provide an increased level of general purpose confidential compute. The privacy story on IC does not feel very good to me right now nor when I speak about it with others as node operators can technically see all data.

I see this as a major hindrance to many kinds of cloud computing use cases that need private computations, as on ICP the risk of leakage is multiplied by the number of independent node operators in a subnet, and there aren’t robust legal or reputational mitigations in place to prevent leakage.

Unfortunately the situation is very bad compared with traditional cloud.

6 Likes

I also have my own use cases I’ve described in this thread or elsewhere.

This doesn’t seem to be prioritized much by DFINITY, before genesis there was talk of possibly having secure enclaves at genesis or soon after.

A major weakness of the protocol right now IMO.

2 Likes

We are also waiting very very eagerly for in-canister access to the secure enclave. It is a blocker on/with almost every potential client we speak with.

8 Likes

Hey!
@rikonor @raymondk can you please comment on this article: “AMD secure VM tech undone by DRAM meddling”?
Thanks!

Hi @pixld8ta - we’re aware of this and the plan is to roll out the firmware fix with the next hostos upgrade.
We don’t have concrete plans yet but it’s something we’re going to start looking into.

1 Like