Seems like another risk is the subnet becoming corrupted somehow. If you lose the subnet, you lose the keys. If you’re going to go BIP32, what if the whole network used the same key? That way all subnets would have to fail. Maybe it is a bigger security risk because all nodes have a share of the key and collusion could be easier?
They had me at ‘consensus/crypto’ interface. Seriously tho’ skilesare think a comment u made on a previous thread that ‘present security’ applies is apropos. So if using zero knowledge proofs to decrypt my encryption (what’s in the box) is the heavy lifting now …ECDSA requires several more shared values which can only be used for a single signing request…. A symphony of communication orchestrated by a conductor beacon begets Perpetual Public Keys (FTW. ) Am sure that the stress testing will be front and center for all us. Will offer a comment and am happy to hear, down the road, if your results match , I think that u can push the non-malicious down to 60% from 66%. If there is such a thing as a faster time frame in this asynchronous platform ( the arrow of time putting the Tok after Tik ) u might be able to go to 30%. Again just a theory , there to be disproven. Looking forward to your seminal documentation.
Good point! Indeed, the current plan is that there will be a single master key that will be deployed in several subnets. Individual canisters will derive subkeys (as in BIP32) from this master key (we have been analyzing other derivation procedures, but for compatibility reasons, we will likely stick with BIP32). This means we can do disaster recovery as you suggest, unless all subnets on which the master key is deployed fail simultaneously. To protect the secrecy of the master key, any subnet that deploys the master key will have to meet certain security requirements (meaning, at the very least, it is “large enough”).
We have longer term plans to implement other disaster recovery strategies as well, but this is the initial plan.
Shouldn’t security requirements be “large enough” AND “distributed enough”?
The distributed enough means that a nation actor cannot expose a master-key deployed in a subnet with collusion with similar-minded nation actors.
Yes, good point! That’s why I said “at the very least”
This is an actual proposal with target dates. The community is banking on this proposal being implemented and deployed within ROM of time as projected out. In other words, this is, hopefully, no longer theoretical.
We know that setting up nodes/node providers take time. Therefore isn’t it time to fix what are the exact security requirements?
What is the exact definition of "large enough "? 17 nodes? 29 nodes? …
What is the exact definition of “distributed enough”? Do we think that we have an requirement of having node providers in Africa/South America prior to go live of this features?
Are there other such security features?
I fear that unless we fix these(and other) constraints, we will find that this proposal will get delayed by a lot. Additionally BTC/ETH integration has this proposal as a pre-requisites.
The community stands ready, I believe, to help. But we need you/dfinity to lead with open discussions about what are the key security considerations; so that we can provide constructive critiques(as many have done already).
Fair enough! Part of the issue is that I’m not certain what level of detail is appropriate for this particular forum, or even for the NNS proposal. That’s mainly on me… I’m kind of a noob when it comes to these communication process issues.
In terms of the precise parameters for allowable subnet configs in this setting, I’ll get back to you on that soon.
Thanks, Victor, for your candid response.
I believe that you/dfinity should think about the technical details embedded in the forum or the NNS proposal in the following ways (as two ends of a pendulum); especially for threshold signatures.
The theory: The algorithms are way too complicated for a regular reader of the forum/nns proposal to even understand. Posting a couple of papers which virtually nobody will understand will cause additional FUD for threshold signatures. The community should trust us/dfinity as the cryptographic experts on the algorithm and the implementation of the algorithm. Besides we don’t want to “open source” this critical piece of tech at this point in ICs dev ; so as not to give a leg up to our competition.
The practice: Assuming the correctness of the algorithm and its implementation, it still must be deployed on the IC. The key consideration of security and other requirements of such powerful technology will take substantial time to document and engage with the community; further delaying the deployment of powerful technology. The community in any event relies on us/dfinity to be the experts and we can always explain post-facto.
The theory: The algorithms are pretty complicated. They might have some hidden bugs/issues; which given enough eyes might lead to be surfaced. Besides the community would see this technology as pretty critical for them to trust IC with BTC/ETH. Let’s release the papers and implementation as soon as possible so as to get the public review started asap.
The practice: Assuming the correctness of the algorithm and its implementation, it still must be deployed on the IC. The community should be completely aware about the different tradeoffs (security and others) that we are making on the deployment of this powerful technology. They will then understand better the risk that they are taking on these tradeoffs. Perhaps they can convince us/dfinity that certain tradeoffs are not good enough and we can change our minds. Therefore let’s release this as well asap.
The truth, of course, will be in shades of gray. That said, I think that being transparent about what decision paths are being taken along these or similar dimensions would be extremely beneficial for the community at large.
Please see discussion (Threshold ECDSA Signatures - #39 by mparikh) on some additional requirements that MIGHT percolate up for system and integration testing.
Hi @mparikh !
The current plan is to enable this ability per subnet via an NNS proposal. That is why I believe whether a subnet meets the criteria could be discussed separately later, since the implementation of a project of this extent will take many months.
Do we have an ball park of what is large enough? I think this question will also be tied to “decentralized enough”. For example, is it a legitimate security threat if all the 17/29/… nodes belong to the same node provider?
Also I think we are way underestimating the amount of lead time required to establish node providers in other regions of the world.
Currently the projection is that threshold signature will be implemented in Q4, 2021. IF the requirements for a subnet is to be “decentralized enough” so that subnet will need node providers in South America & Africa, not sure about the time frame that that will take.
Hey folks, looks like questions are still coming in. Do folks prefer we push back the NNS motion to let more questions/iterations come in?
I want to make sure we strike a balance of moving fast but letting people have their say.
At one level, the NNS Motion Proposal is asking for community’s approval on the design and project of threshold signature features.
My previous posts on this topic have pointed questions on both the design and project.
A. What is the core of the design based on? We have been promised a couple of papers in a few week from now that detail the new async protocols and relevant security analysis.
B. Nowhere is it mentioned how do we(the community) sees the entire implementation of the core design. Ie how does the implementation stack up against the design?
C. What are the exact requirements (security and otherwise)through which deployment would be handled? This should form the basis of a project plan.
D. How does an actual deployment testbed look like? How will the testbed manifest itself?
E. What are the actual test cases both from a unit test case standpoint as well as integration standpoint that are going to be executed?
I actually don’t expect that some of these questions will be answered; prior to this NNS proposal passing. But I think reasoning it out in context of why certain questions will not be answered; even if they seem reasonable in design and project setting ;would be very useful. I have given an handwavy methodology of how to think about addressing these in a previous post.
I believe that given a little more effort, it may be possible to stick to the deadlines with the relevant information being passed in this topic.
It sounds like you are saying (and correct me if I am wrong):
"I would like to have the following questions answered before an NNS Motion process makes sense to me."
If so, it sounds like you ARE saying, “yes, we should move the submission so the team in this project should provide more info.”
For context, I was thinking of giving them an extra day or so (not weeks). Because of the global remote nature of communication across timezones.
Would capturing your questions and pushing the proposal by a day meet your intent @mparikh or did I misunderstand?
Clearly, we are still iterating and working with each team to get a handle on the appropriate level of detail as per Victor’s message:
Yes, the intent is to capture the questions and attempt to answer as many of them as we can (with rationale about questions that cannot be answered). If it takes some more material time , that’s ok for me.
Makes sense. That is, after all, the entire goal of this process: improve the design + process + communication by seeing what works, what does not, how much detail is too much or too little, how much time to let conversations flow, etc…
Hello again @mparikh !
It is nice to see how much you care about the project. Let me try to answer your points as much as I can. One of the reasons behind such proposals is to see if the community support a project before the foundation invests further in it. The paper, that is the answer to your question A above, is still being worked on by our cryptographers. We have an initial plan and the corresponding security proof. However, we are considering multiple optimizations on top of that, each of which requires the security proof to be reworked.
As you most probably know, every time we update the nodes, that happens through NNS proposals. What we propose currently is to invest further time into this project: in its implementation, testing plan, validation and verification. That time is also what we need to answer some of your questions, such as how the implementation needs to be tested.
Once we have an implementation, then, the community can decide whether to deploy it or not. In fact, specifically for this project, we can say that the deployment is not the critical moment; because according to what we propose now, ecdsa signatures won’t be supported by any subnet by default. Once the implementation is deployed, then we will have separate proposals (one per subnet) to enable this feature on subnets.
Once the implementation is ready, it will end up here with the rest of the code: https://github.com/dfinity/ic . As I said, it will be completely inactive by the time it is first deployed. The unit tests would be there too. By then, we can also provide further information on integration tests too.
When the first NNS proposal to enable this feature for a subnet comes, the voters can check the corresponding subnet on the dashboard before voting: https://dashboard.internetcomputer.org/subnets . If you click on any subnet’s identifier, you can see how distributed it is. You are right that this information is especially relevant for this project.
In short, I am trying to explain in this message that any actual change to the IC goes in with other NNS proposals. This current proposal is to get feedback about whether we should further invest in this project, which would for sure involve getting the answers to the questions you asked as well.
I hope this helps!
Thank you very much for your reply. It helps the community understand your perspective better.
I had misunderstood the purpose of this proposal. Since the explicit goal was “to approve design and project”, it didn’t even cross my mind that you were actually asking whether to continue work in this proposal or not.
If that is the sole intent of this proposal, then , of course, some of the deployment questions posed by me are irrelevant in context of this proposal. Perhaps the initial wording of the proposal could have been different.
I think the proposal and thinking is becoming clearer as we iterate and converse, but I would like to let the conversation brew a bit more (this is clearly a subjective gut feeling that I have from sensing the “clarity temperature” in the virtual room). I also want to allow the team to reword any areas which were not clear (as evident from the questions). So I am moving the NNS Motion proposal to Monday September 20.
Note: we have two NNS Motion proposal being submitted in 10 hours, so I want to make sure we allow folks the time to digest the information.
Yes, the intent is to capture the questions and attempt to answer as many of them as we can (with rationale about questions that cannot be answered). If it takes some more material time , that’s ok for me.
After submission of our proposal for the threshold ECDSA feature, there has been a discussion going on aspects such as the availability of the detailed protocol specification, the implementation matching up the design, security properties and testing. We will try to provide an as-comprehensive response as possible in the given time frame, i.e., before the NNS proposal will be launched on Monday, in addition to what has been already responded to by Ege and others. We hope that we can provide further details for people interested in this proposal. Much of the information provided in this post addresses quality assurance, i.e., goes beyond the core of this proposal, but helps understand the broader picture of what we need to do for having high assurance of correctness of the implementation of the feature.
Regarding the question on the research paper regarding the cryptographic protocols: We also need to perform an internal security review before we can publish the results to the wider community. Thus, we ask you for a bit more patience here as we are, as Ege mentioned, still in the process of finalizing the proofs for the optimizations we have performed. Once released, everybody will have the opportunity to scrutinize our work w.r.t. the cryptographic protocols and provide their valuable feedback.
You have asked some questions around the topics of criteria we need to meet for deployment of the feature as well as our approach towards testing. We think those are excellent questions of high importance and therefore let us try to give some more detail already now in addition to what has been said by Ege already. This will also give you some insight into the engineering process at DFINITY. Further details on this feature become available once we proceed with the implementation.
This feature is a security-critical feature of the Internet Computer as substantial value, e.g., in the form of bitcoins, will be safeguarded through it. For this reason we consider quality assurance a primary concern throughout the lifecycle of the feature. What makes this feature particularly interesting (i.e., challenging) in terms of security is that it involves highly non-trivial research work in cryptographic multi-party protocols. Let us outline next how our project plan considers security end-to-end for this feature throughout the whole process, starting with the research side of things.
- Internal peer reviews: The research results on the protocol we intend to implement is being conducted in-house. We will run a first peer review round internally with DFINITY’s cryptographers. This will be done before publishing the protocol specification to a wider audience in order to provide our community with a result we have high confidence in.
- Review by our community: Once we have released the first version of our paper, interested cryptographers in the community can review the result and provide us with valuable feedback. This can be seen as the scientific review process, just within a completely open community of interested specialists. After the community feedback process, we expect to have a broadly accepted protocol specification of which both us (DFINITY) and the community are convinced of its correctness and fitness for its intended purpose.
At the engineering side, quality assurance is of paramount importance as well. For a complex feature such as the threshold ECDSA feature, there are multiple high-level goals to achieve here, e.g.:
- Specification-conformance of the implementation;
- Service level objectives being met for the feature, e.g., throughput, latency and availability goals;
- The Internet Computer protocol stack not suffering from the integration of this new feature in unintended ways.
These relate to some of the crucial acceptance criteria from a systems perspective. Of course, there is also a set of specific security-related criteria which govern the correctness and liveness of the protocol. These will become clear from the research paper to be published in the near future.
We want to stress for everybody that it is not possible to guarantee the correctness of the protocol at the implementation level. The best we can do in engineering is to provide a sufficiently high degree of assurance that the feature has been implemented according to the specification, which will have an underlying formal security proof, and does not interfere with our protocol stack in unintended ways.
Achieving this requires (1), an effective code review process, (2) sufficient automated testing coverage and (3) the protocol researcher closely working with the engineers as quality assurance measures. For this feature Victor, who is leading the cryptographic research for threshold ECDSA, is working closely with the engineering team to help the knowledge transfer and ensure things are implemented as intended.
Code reviews are an integral part of the engineering process at DFINITY. Both researchers and other engineers perform code reviews. Through this we attempt to obtain a high degree of assurance that the protocol specification has been implemented as intended. Critical components, like the ones for this feature, are subjected to a more stringent review process to raise the level of assurance provided. The complexity of some of our protocols, including threshold ECDSA, is considerable, and it would be extremely challenging to implement those protocols without the researcher’s close and direct support of the engineers.
Automated testing is a crucial aspect of our engineering process. At DFINITY we have 4 different classes of tests we employ for gaining sufficient assurance that our source code does what it is supposed to do at different levels. Not all functionality / code is tested at all levels and with the same rigor, but we need to ensure that we have an overall sufficiently high test coverage. This balance is found based on judgment of our engineers and researchers on where to spend our (limited) efforts to obtain the most effective test suite for a given feature. Let me next outline the different levels of tests we employ for our source code.
- Unit tests:
Unit tests are the most focused of our automated tests, testing just a single file. Unit tests are, as standard in Rust, white box tests that can access internal variables and interfaces. Unit tests use stubs, fakes or mocks in order to avoid external dependencies and test the functionality under test in an isolated manner. With unit tests we will test our individual primitives making up the implementation, e.g., our cryptographic building blocks.
- Integration tests:
With our integration tests, we gain assurance of correctness of a whole module’s implementation, that is, we test the interfaces provided to other modules in a more black-box fashion.
- System and production tests:
Those classes of tests test the feature at the integrated level, i.e., the interplay between the different components making up the feature and the Internet Computer codebase at large. System tests are essentially Continuous Integration tests and are part of the development pipeline and govern merging of changes. Those are the more efficient tests in terms of resource consumption and execution time as they are being executed frequently. System tests run on a single test machine and test the code of the feature in a holistic manner. Production tests are the most comprehensive and integrative tests and take up substantial resources and time to run. They are Continuous Delivery tests that govern the rollout of changes to our staging and production environments. Production tests spin up their own Internet Computer instance including NNS, with machines distributed throughout data centers around the globe. The idea is to run the tests in an environment that is as close as possible to our real-world production environment.
System and production tests are the most interesting ones to discuss here in that our test framework is specifically designed for the requirements of testing the Internet Computer, whereas unit and integration tests are more standard. For example, our test framework allows us to define how many replicas in the subnet running a specific test should behave maliciously, which we need to test the feature’s resilience against (Byzantine) faults in a real-world setting.
As already mentioned by Ege, we will be working out the concrete testing plan as we go along with the implementation. Here are some ideas of what we might specifically want to test in our system and production tests:
- Replicas leaving / crashing during the signing process: In such a scenario, we do not want the signing process to abort, but still be able to succeed, without the faulty node. This is a crucial property of our protocol, which differentiates it from much of the available cryptographic literature on the topic.
- Testing the feature under the boundary conditions of the largest allowed number of nodes having failed / being dishonest.
- Performing a large number of signing operations, also together with high regular load on the subnet, to test the feature under heavy load conditions.
We hope that this gives you a more concrete view of what we are planning to do in terms of making sure the feature is implemented in a secure manner. Hope that helps!