We are super happy to announce a major milestone for us as the node team and for the Internet Computer: the very first SEV-SNP-enabled node hckfw is live and running smoothly!
A big thank you to @icarus from Icaria Systems, who successfully deployed the node last Thursday. We waited to announce it right away to make sure everything was stable, and we are thrilled to report that the node has been running for several days now without any issues.
This is a huge step forward on the path to TEE-protected subnets, a cornerstone for increasing the confidentiality and security guarantees of the Internet Computer.
What’s Next
While this deployment is a huge achievement itself, it is just the beginning. The next challenge will be performing the first upgrade of an SEV-SNP-enabled node.
Unlike the current upgrade process, which simply replaces the VM and reboots, the new approach is much more complex. We’ll need to:
Boot both the old and new VMs simultaneously,
Hand over key state between them,
And ensure the new VM can seamlessly continue the node’s work.
We expect this upgrade to happen later this week and will report back here in this thread on the outcome (fingers crossed ).
The Road to TEE-Protected Subnets
Once we have validated that single SEV-SNP-enabled nodes operate and upgrade reliably, we will start working with node providers to (re)deploy more and more SEV-SNP-enabled nodes.
In parallel, we are also working on the final major piece needed for full TEE-protected subnets: subnet recovery. That work is well underway, though it will take a bit more time due to its complexity.
Big thanks again to Icarus (and congratulations for deploying the very first SEV-SNP-enabled node) and a huge shoutout to everyone involved in making this happen.
Thanks for the shout-out @rbirkner and yes it is very cool to be hosting the first TEE privacy enabled IC mainnet node!
All the kudos goes to the DFINITY Node team and systems engineers for making the IC-OS installation procedure and encrypted VM attestation work so smoothly.
Onwards and upwards to a full IC subnet of TEE encrypted replica nodes!
Hey @rbirkner, I think this will be explosive when fully encrypted subnets are available. There are so many use cases and so much need for this.
I have a couple of questions I’d love to get some confirmation on if you’re able and have time.
In an active SEV subnet, are data included in calls between canisters within that same subnet protected with the same level of encryption as data that simply sits in one of the canisters?
What about data that’s included in calls between canisters that both sit in different active SEV subnets (is this considered leakable or protected by a lesser form of encryption)?
Any thoughts or details you’re able to elaborate on would be very useful for planning services that take advantage of this upcoming IC feature.
Sorry, for the delayed response and thanks @Lorimer for the great question (and nudge; it was needed).
In general, each node has its own TLS key pair. The public key is part of the corresponding node record in the registry. When two nodes of the IC communicate, they will do so over TLS and verify that the other end is using the key pair “from the registry”. So, nobody can listen to what the nodes communicate.
When a node is SEV-enabled, then there is no way for a NP to get inside the node and get access to the TLS key pair. If the communication is mixed, it is only as secure as the non-SEV-enabled node.
I hope this answers your question! Otherwise, please complain
that’s a great idea! An update on how we propose to do recovery is long overdue. I will try to get something written up early next week and will ping you here.
A very short glimpse of what expects you:
Recovery depends highly on whether the orchestrator is still running and “listening” to the registry changes. If that’s the case, we can follow a similar recovery procedure as before (proposals to halt the subnet, to get SSH access to read the state, to resume the subnet on a fixed version, etc.).
If the orchestrator is not running anymore, it gets tricky because now there is no way to tell the node to upgrade, give access to the state, or do something else. In these cases, we have two different approaches: a rollback from the outside, then hopefully the orchestrator works again, and the subnet can be recovered as in the previous case.
If the rollback doesn’t work, it gets very tricky and it needs a longer explanation: basically, there will be a proposal to approve an alternative root file system. Once the NNS adopts it, the node providers can download it, restart the node with that approved root FS and then we should again be in the case where the orchestrator works. Why and how the alternative root FS works, I will explain in more detail next week.