Threshold ECDSA Signatures

Hey folks,

Here is a relevant security audits for those interested:

  1. "Threshold ECDSA Integration and Bitcoin Canisters - Security Review" by Trail of Bits (third-party security audit #5)

  2. "Canister Sandbox Review" by Trail of Bits (third-party security audit #4)

5 Likes

A subnet maintains a single secret ECDSA key, and we can actually use that to derive keys many keys per canister (using BIP32). This means that we only have to maintain one secret key on a subnet, while functionally it’s as if every canister has it’s own keys.

5 Likes

Hi everybody! There have been some concerns about the 1/3rd (f+1 out of 3f+1) threshold for the ECDSA keys on the internet computer, and whether this gives sufficient security. We have taken this feedback seriously and took a step back to revisit different approaches to increase the security of threshold ECDSA.

First, some background info. It is very difficult to make threshold ECDSA signatures, much more difficult than eg threshold BLS signatures. We cannot simply set a 2/3rd (i.e., 2f+1 out of 3f+1) threshold for ECDSA signatures like we do for BLS. The best fault tolerance such that we both have liveness and safety from known practical protocols is 1/3rd (i.e., f+1 out of 3f+1). Note that this fault tolerance is actually the same as the consensus protocol ICC uses: even though we use 2/3rd threshold BLS keys, the protocol can only be proven safe and live with up to 1/3rd corruptions, which is also the theoretical optimal threshold for asynchronous consensus. Despite this, you could argue that practical attacks on tECDSA are perhaps easier to perform than attacks on ICC: for tECDSA, you’d have to steal the key material of 1/3rd of the nodes and obtain the full signing key, while for ICC you’d either need an coordinated active attack controlling 1/3rd of the nodes and some control over the network connection between nodes, or you need to steal the key material of 2/3rd of the nodes which would let you sign anything.

So how can we make things even more secure? There are many different approaches:

  1. More nodes & larger subnets. This is conceptually the simplest approach. Work is in progress on this front, and you can follow the status in this forum thread.
  2. Shuffle nodes between subnets. This has been suggested by the community and is an appealing idea: it could mean that an attacker cannot target a specific subnet as the membership constantly changes, and the only way to corrupt a subnet is to compromise a sufficiently large fraction of the overall IC nodes, meaning each subnet becomes more secure as the IC gets bigger. However, I believe that we first need more independent nodes before shuffling actually adds security.
  3. Enable AMD SEV. Enabling some form of SEV could make it much harder to break into a node. Currently, the current IC nodes are of the “Gen-1” hardware specifiation which only support SEV-ES (and is no longer available) while new nodes will have Gen-2 hardware which support SEV-SNP. Gen-2 hardware (specification) is currently available. The relevant point here is that Gen-1 hardware with SEV-ES do not provide the security guarantees we want as they do not protect memory integrity and are thus vulnerable to attack by privileged users while Gen-2 with SEV-SNP does provide memory integrity and the ability for external parties to more easily validate the integrity of the system.
  4. Increase tECDSA security threshold.While we know that we cannot increase the fault tolerance for both safety and liveness, we could sacrifice some liveness fault tolerance to increase the safety fault tolerance. A concrete example: a 34 node subnet can currently tolerate 11 corrupted nodes, meaning we can still sign, and the secret key is not compromised. We could change the thresholds to tolerate 15 corrupted nodes, making it harder for the attacker to steal the secret key. However, this means we can only tolerate 3 nodes that are unavailable. If a 4th node is offline, the subnet cannot create ECDSA signatures anymore.
    Since being unable to sign with a key is almost as bad as an attacker compromising a key, we think that this approach is not the best solution to apply on all subnets. It could be a promising solution to let smaller subnets also maintain the ECDSA key using a higher threshold (such that we can scale out the signing capabilities), while larger subnets still maintain the ECDSA key with a 1/3rd threshold. In this way, if a smaller subnet can no longer create ECDSA signatures due to too many failures, the larger subnets still hold the ECDSA key and can reshare it to other subnets.
  5. Regularly reshare the ECDSA secret key. The idea is that nodes can regularly update their encryption key used to decrypt their secret key shares in the distributed key generation protocol that maintains the ECDSA key. The ECDSA secret key shares will be reshared every time membership changes or any of the members updated their encryption key.
    The result of this approach is that if an attacker steals an ECDSA secret key share or a DKG decryption key from a node, it will only be usable for a limited amount of time, because the keys refresh regularly. This means that an attacker would have to steal the key material of 1/3rd of the subnet nodes within a small time window, which seems much more complicated to pull off.
    One caveat is that the node signing key could also be stolen, and an attacker could try to slowly steal node signing keys, and then quickly register malicious encryption keys on behalf of those nodes, such that it can still steal the ECDSA signing key. To avoid this attack, we can limit how frequently node encryption keys can be updated for nodes in one subnet. An attacker can then only slowly try to register malicious keys for nodes. However, this is a very visible attack, and nodes can complain upon seeing a malicious key registered in their name, upon which they can be removed from the subnet.

The first three options all require a bit more time, as it involves new hardware and onboarding new node providers. So while we are actively working on 1 and 3, I do not expect they will help strengthen the ECDSA security this calendar year. Option 4 is mainly helpful to allow smaller subnets to also sign, when we have larger subnets that back up the key with the standard threshold. Initially we plan to only use large subnets, where this idea does not help. Option 5 however meaningfully improves the security of ECDSA and can be implemented more quickly.

Planned next steps.

Our proposal is to implement option 5 now, and then release a “production ready” ECDSA key, which will be maintained on large (e.g., 34 node) subnets. We can further improve the security of this key when more nodes are added and increase the subnet size. However, when a new feature is added that significantly improves the security (eg, SEV enabled replicas), we can also consider creating a new ECDSA key that has always been at this level of security, and applications can then choose whether they want to migrate to this key.

Please let us know what you think!

28 Likes

It clearly states that private key are never reconstructed.
I assume that the private key that would have been generated in the first stage before being distributed as secret shares no longer exists, but how is it destroyed? I assume they are securely deleted without being reconstructed or restored, but I am curious.
I read the White Paper thinking that this applies not only to threshold ECDSA signatures, but also to threshold BLS signatures, but there is no mention of this, so I am asking.

1 Like

The private key never lives in one place, not even temporarily, hence does not need to be destroyed. There is a process called distributed key generation (DKG) which that it is generated in a distributed way, in other words it is “born distributed”. That applies to both ECDSA and BLS threshold signature keys.

9 Likes

5 doesn’t help much with the collusion attack vector which is what worries me most. The Bitcoin integration is live, and ckBTC is about to be live. Assuming the tECDSA subnet has 34 nodes (can anyone confirm?) Then 12 (11?) node operators can collude to steal everything built on top of that master tECDSA key (right?).

12 known parties, who know each other and do not rotate their membership on a regular basis, can do this.

We’re going to try storing $millions or more here?

Update: I’m trying to identify the tECDSA subnet. What is its id? I’m looking through the subnets on the dashboard and there are only a few higher-replication subnets to choose from, the NNS, the one fiduciary, and one other system subnet that is identified as the II subnet.

The unmarked system subnet Subnet: w4rem-dv5e3-widiz-wbpea-kbttk-mnzfm-tzrc7-svcj3-kbxyb-zamch-hqe - IC Dashboard only has 13 nodes, but it seems most likely to be the tECDSA subnet. Which is it?

I’ll update this new thread with the relevant information once known: tECDSA subnet id and takeover threshold

5 Likes

Fiduciary subnet pzp6e is the ECDSA signing subnet, currently consisting of 28 nodes.

3 Likes

Is that 9 or 10 independent nodes necessary for a complete takeover of the master key?

1 Like

I think it’s 10, 3f + 1 = 28, f = 9, f+ 1 = 10 necessary for takeover.

1 Like

Exactly, 10 required for takeover.

3 Likes

Not really any worse than Solana or BSC, pragmatic. I think this is safe enough as long as there is key refresh, could be made better with TEEs.

If it is just “not worse” than a traditional bridge, plus it has the added risk of a new technology never used in production, what is the advantage?

2 Likes

Is there something to compare this to apples to apples? It is certainly better than one entity(Coinbase, Binance) controlling your BTC, but are there really other non-custodial solutions that we should be using for comparison?

Getting 10 doxed companies to collude in theft is a pretty damn high bar. I’d think the other options have non-doxed which would be much less safe depending on the replication factor.

Obviously on the other end is bitcoin itself and we won’t meet that threshold of security, but what is “enough”?

1 Like

So if we had a traditioal bridge secured with 10 way multi-sig security and the 10 key distributed among 10 doxed companies, wouldn’t that provide the same level of security without the risk of using a new technology never used in production?

3 Likes

Sure. The same level of risk, but not the same level of interoperability, computability, and scalability opportunity. You’ve got to use it in production at some point to overcome your argument. Are you proposing never turning it on because there are risks? Certainly, no one should move their whole stack over on day 1. Don’t risk what you’re not willing to lose and eventually the platform will have secured X million dollars of bitcoin for Y number of days and you’ll have a new production-level risk floor.

3 Likes

I’m trying to understand what is the final beneffit once we reach the perfect production quality level.

Probably true but It is too generic. It does not say how.

Maybe the “how this new technolgy help reach the higher level of interoperability, computability, and scalability opportunity” is by eliminating the dependency on humans as key keepers? If this is correct, why not exaplaining things by focusing on this unique key benefit?

1 Like

I think the theory/protocol is very sound, but the current practical implementation limits the actual security in practice.

I am going to start feeling much more comfortable when subnet replications increase, the number of independent node operators increases, we implement node rotation, and have secure enclaves.

The best I can think of for measuring “safe enough” for subnet size is the size of Chainlink oracle networks. Chainlink is one of the most trusted and well-known live production projects in blockchain, and AFAIK secures $billions.

I’m not sure which Chainlink oracle networks are securing what value, but if we look at their data feeds the first two, ETH/USD and BTC/USD are probably the most widely used. Check them out: https://data.chain.link/

They each rely on a 21/31 assumption. So their subnets have 31 nodes, and 21 must come to agreement before a trusted answer can be created. Seems they have a fault tolerance similar to IC subnets.

But the tECDSA subnet has essentially half of the fault tolerance. So if we extend Chainlink’s security model to tECDSA, we would need a subnet of 61 nodes to ensure that it would take 21 colluding node operators to steal everything.

Combine a 61 node subnet with node rotation, secure enclaves, and a healthy level of independent node operators and I think we’re reaching quite fantastic levels of security.

I wish DFINITY would put more effort into studying and attempting to quantify sufficient levels of decentralization, no one seems to have any idea what levels are required. Their old consensus paper from 2018 had math explaining probabilities of corruption from colluding attackers based on committee and total population sizes, it was really great stuff. But that seems to have been thrown away.

5 Likes

Hasn’t Timo shown that rotation actually increases the possibility of corruption because there is a higher probability that a group will eventually get rotated than enough in any one particular group will become corrupted? I thought I saw a post on that somewhere.

1 Like

That concern has been brought up multiple times, though depending on your assumptions I don’t think the concern plays out. Myself and a few others have dug into the math as well and we can get it to work depending on the underlying assumptions.

I think most people including @Manu are on-board with rotation now, but it probably doesn’t make sense until there are many more independent node operators (I would guess that was one of the assumptions, the pool of node operators to choose from, the percent honest/dishonest, and the size of subnets).

It’s this kind of math and analysis I would like to see from DFINITY, in addition to all of the other very high quality research they publish.

3 Likes

This thread has some of the math in it: Shuffling node memberships of subnets: an exploratory conversation

Oh wait, maybe the math didn’t show what I thought it showed…see for yourself I suppose. I’m not convinced it’s a bad idea, and the fact that others whose opinions I highly regard on the matter seem to agree then I am very much still on board with node rotation.

If we have a conclusive analysis that it’s dangerous then I’ll be done with the idea, but I highly doubt such an analysis would end up that way considering the fundamental importance of shuffling in other designs e.g. Ethereum.

4 Likes