Thanks for asking Diego. I’ll try to keep it short.
I would not state it this way. It’s my opinion that the risk of perpetual delay will have a far greater impact on the ecosystem than the risk associated with congestion and Wasm jailbreak. Therefore, I am choosing what I perceive to be the lesser of two evils. If we look back at some previous comments it might help explain my point of view.
This may sound rational; but, this tells me that the security team has not defined an acceptable level of risk. Sure, they have assessed the vulnerability, but that is different than managing risk. Basically, it sounds to me like we are making decisions based on the security team’s feelings rather than anything quantifiable. I could certainly be wrong; but this is how I perceive the situation based on the information presented here.
This seems to contradict the first statement about not seeking perfection. So findings have trickled in the last couple months; do we really expect that no additional findings will trickle in between now and January? As someone who also works in security, I highly doubt it. At what point do these security concerns result in perpetual delay? I would be more confident about avoiding this scenario if I wasn’t already under the impression that risk decisions were being made based on feelings.
This sounds positive. So the most important congestion issue is (tentatively) being addressed at the end of November. Considering it is November now, i’d say that leaves a pretty short window of opportunity for Dfinity to implement the original proposal; developers to push out DeFi solutions; and for an attacker to exploit the vulnerability. Obviously there are other concerns, but if we are making incremental improvements to our security posture over the next 3-4 months how likely is an attacker to be successful in exploiting any of these individual vulnerabilities? That’s an honest question, obviously i’m not familiar with everything that falls under the “congestion umbrella” (for good reason).
So a risk was identified, with no concrete attack vectors, and there is a mitigation in work. Personally, I am comfortable accepting this risk given the ETA of Q1 2022 for the sandboxing feature.
Obviously this is just one opinion. I am not trying to discredit Dfinity or its security team. I’m just not convinced they see the forest for the trees.
I believe that if we take Dominic’s tweet literally, he is advocating for dfinity foundation to vote against the proposal to delay proposal 20588. This would be troublesome because it is the security engineers and researchers at dfinity who are advocating to delay proposal 20588.
However in all things software, taking everything literally might not be ideal. I would suggest that a true security assessment (threat vs risk along with risk mitigation) as advocated by @LightningLad91 be done and attached to the proposal to delay. Then vote and let’s the chips fall where they do.
We are thinking to put out a proposal to enable canisters to transfer ICP next week. Instead of making a community proposal to delay or not, it makes more sense to vote directly on the real proposal. If you prefer to delay you can vote to reject an upgrade, if you don’t want to delay you can vote to accept the upgrade.
On the security front, the canister sandboxing ETA is still end of December, and we have made good progress on the congestion issues.
Is there a more in-depth description of the risks associated with “congestion” than @diegop’s description above. What does “too late” mean exactly. Is there an theoretical example of how this could lead to loss of funds?
What does an attacker need to do in order to exploit a WASM jailbreak if possible? Do they need to control a single node, 2/3 of nodes as @LightningLad91 asks, or might it be possible to craft a payload like a malicious canister that would then be able to compromise other canisters (Edit confirmed the theoretical attack is a canister escaping its runtime so could in theory be exploited by anyone rather than just a node operator)
Could this feature be initially enabled only on some subnets rather than all? Possible benefits:
Signals that this is an experimental feature, and creates a kind of “testnet” style pathway by which features can be rolled out.
Could resurrect the idea of “Authorised subnet” and only allow authorised developers to deploy on subnets which allow canisters to hold ICP - this might reduce risk of WASM jailbreak attacks, and could also reduce risk of rug pulls.
Could possibly throttle messages to this subnet earlier thus reducing the possibility of DoS attacks aimed at creating congestion.
This is purely antecdotal… but it does play into reputational concern albeit from a different perspective. IC is seen as a palpable threat to some extremely well financed interests. Let me emphasize the term ‘extremely well financed’ and add reckless. Does not mean that folks should be afraid, but wariness is appropriate. The general fragility of some parts of the crypto ecosystem at this moment coupled with the not insignificant probability of an orchestrated rug pull by ‘outside interests’ taking advantage of the community that would be damaging to Dfinity’s reputation… might suggest an ultra conservative approach is warranted. Yes this is qualitative… but it feels like a thesis that is supported by the external environment. Respect the urgency felt by folks that want to ‘make it happen’ now, it is understandably very hard to be patient when time and effort has been made. Especially when you have conviction that you have built a better mousetrap. Hope you appreciate this commentary in the spirit with which it is intended, as someone that duly supports IC. In summation, allocating a few additional ounces of prevention in deferring to what ‘security’ recommends seems warranted. We cannot legislate good behavior however we can make best efforts for good guardrails and hold ourselves accountable for the latter.
Hello. I just wanted to clarify my prior statement.
My question wasn’t really about who could perform the attack. It was about how the attack would have to be carried out. If I’m understanding it correctly, the Wasm runtime exists on each node that the replica is deployed too. So using the latest subnet numbers I believe that is 13 nodes = 13 replicas. So 2/3 would be roughly ~9 nodes. So each replica would have to break out of its runtime on 9 different nodes. That’s probably not too hard to imagine since the exploit would probably be just as effective on one node as another; but, the underlying IC-OS is where it gets tricky for me. Now each of the 9 replicas is responsible for understanding its own environment and targeting the right process that is holding the valuable assets. Does the IC-OS identify each process the same way on every node? Is this the type of information that the attacker would be able to figure out before constructing the payload, or does this have to be resolved in real-time?
Maybe I’m overthinking this? Maybe the attacker just has to escape the canister and set up some sort of C2 capability that allows them to navigate all the more complex details of underly IC-OS. It would be great to understand how much protection we gain from the replicas in this scenario.
AFIK the issue is that specially crafted code in one canister could access state of other canisters. I guess the exact memory location might vary from node to node but would guess that if malicious code works on one replica it likely works for them all. Security Sandboxing - #3 by diegop
Understood. If it’s really that easy to locate a specific memory location and read it across 2/3 of the nodes at the same time then that would be good to know. I still imagine there are a lot of other moving parts that would make the attack more complex, but this is good information.
Thanks for the glimpse! I wonder if we will regret not putting the error variant in an opt, which is necessary if you want to extend the set of variant tags (e.g. errors) later in a backwards-comlatible way. Is it too late to change that?
Yuck, although I just noticed that with the latest iteration of subtyping on Candid values, the idea of wrapping variant in opt to make them extensible doesn’t work: with the subtype check on the level of types, such an extended opt type will always be null, even if a known tag is observed. I will bring this up in the Candid repo.
Looks great, I see its a draft, where/when are the candid methods to read the ledger and find specific transactions? For a canister to cept icp it will also want to be able to confirm that someone specific sent it. Will that also be with this proposal?
The version that we’ll release will not have a method to fetch blocks – we are still working on that. However, one can already do the kind of check you suggest by encoding in the destination subaccount all the information you’d want to be associated with the transaction. The subaccount would need to be agreed with the sender somehow, but then checking the balance of the corresponding address (which should be non-zero if the correct transfer was made) should ensure that all is as expected: the balance should be the expected amount and the subaccount info confirms the parameters.
Of course, once we’ll have fetching blocks there will be an additional mean to check by looking at the transaction rather than at the destination address. I hope this helps.