Let's solve these crucial protocol weaknesses

With more formal research of what level of decentralization is necessary, and combined with the following (I know some are pipe dreams), I would feel much better, maybe even amazing about ICP’s security relative to the major secure blockchains like Bitcoin and Ethereum:

  • Node shuffling
  • Secure enclaves
  • Anonymous node providers with ZK identity verification
  • Node staking/slashing
  • Automatic node onboarding
  • Sufficiently large subnets (need more research to know what that is)
  • Node operators shouldn’t know what subnet they’re in, what canisters they’re running, nor be able to see any of the canister data
  • zkWasm or another way for clients to verify computations beyond blindly trusting the consensus amongst a small group of node operators

And I would perhaps feel more comfortable calling this a blockchain, and apps running on it “on-chain”.

6 Likes

These make me a bit sad as one of my biggest hopes for decentralized blockchains is that we’ll be able to do things in the future, that we don’t know we want to do now, based on what happened in the past. Once we start throwing away data we lose some of that optionality.

The good news is that IC canisters can recreate this by logging to ICRC3 logs. We can find ways in the future to offload the archives from state with return with a witness scheme so that we get the best of both worlds. Ethereum was going down this route with the “stateless” themes back in 2018, but it seemed to fall out of favor for L2 solutions.

Putting together these libraries and frameworks is a bit down my todo list, but is on there. :).

tldr: we can offload state if we have a way to bring it back with a witness. This creates a data availability problem, but some solutions for that are emerging.

2 Likes

Why should the internet computer resemble a10-year-old blockchain like Bitcoin and Ethereum and not the newer blockchains like Sui, Aptos, and Solana which doesn’t keep the entire history of the chain? Would you say those aren’t blockchains as well?

2 Likes

Looks like cost per TB is about $10 and will be half as much in 18 months. https://diskprices.com/

126 * $10 * 4x geographic replication - ~$5000. That actually isn’t that much considering what a subnet can do(especially how much we’re paying subnets right now).

Given @dieter.sommer 's comment about 128 TB per year of block state, perhaps it wouldn’t be crazy to have some ‘fully replayable’ subnets? You probably don’t need to host asset canister there, but things like tokens or governance might be worth paying the extra cycles per year to store all the data?

1 Like

That’s in the first year. Costs grow linearly (edit: was “quadratically”) over time. So in 10 years you’d have to pay 10x (edit: was “50x”) more per year. Storage costs may decrease over time, but quite possibly not at that rate.

This data would also have to be queriable, which doesn’t come for free. Someone (likely multiple someones) may want to download a few subnets’ data for the past couple of years. And if the argument is that very few (if any) parties would be interested in that, then what’s the point of persisting and making available millions of dollars worth of data forever?

4 Likes

While this is technically true, what I, and I guess Jordan too, are trying to point out is that the protocol’s design often overreaches and makes lots of these decisions for us, tweaking the tradeoffs comes with tons of friction and is sometimes even impossible due to the fact the protocol itself is being developed and optimized with the current topology as a the main frame of reference.

Say I want to run my dApp on 6 nodes, making an NNS proposal is trivial, but I have to hope it gets approved in the first place, then the subnet has to be formed.
Eventually I find out the configuration doesn’t work as expected, so I propose to create a 5 node subnet, but the NNS is concerned there isn’t enough demand for it and the proposal is rejected, in the end I decide to compromise and use an existing 4 nodes subnet.
Fast forward two years and my app has become a potential target by hosting a whisteblower’s deadman switch on it, I decide to prioritize security over performance and want to move it to a 100 nodes subnet, unfortunately it turns out the protocol can’t handle that, due to some component bottlenecking, which nobody knew about, cause it had never been tried before.

The current design disincentivizes experimentation as it leads to topology fragmentation, which reduces cost efficiency of subnets. If my 6 node subnet is approved and it doesn’t get much traction there are only two options for the governance body: either get rid of it and screw me over or accept the inefficiency, both aren’t ideal.

This is just for customizing replication factor, the number of possible permutations taking into account all the variants of execution/consensus related settings, makes it impossible to satisfy everyone’s needs with the current architecture.
Subnet rental is going to be the only option in those cases, but it’ll require a significant investment many aren’t capable or willing to undertake.

If the IC at a fundamental level provided “just” an incentive structure to reward network participants and a protocol to run serverless services with blockchain functionality layered on top, with no assumptions about how the network itself is structured, we’d be able to experiment much more freely and find the right sets of tradeoffs for our use case more quickly.

Subnet’s architecture is too rigid and gets in the way of that, ideally the dev experience should be something like: “Run this canister on N nodes with high CPU, low storage, etc… with this execution settings”.

3 Likes

AFAIK there are no structural requirements for subnets to have 13 replicas. You can look for end-to-end tests: they are running combinations of 1 node subnets, 4 node subnets and many others. IIRC there is some minor implementation detail that would prevent a 100-replica subnet from working, but it’s definitely not an architecture / structural constraint.

As said (and again AFAIK), the only reasons why we don’t have arbitrary subnet sizes are (1) no one really had a compelling argument for them and (2) a subnet with too low a replication factor might more easily be taken over and we are still missing a few checks and balances that would prevent a malicious subnet from minting cycles at will (and sending them to any canister); or flooding honest subnets with garbage. I know there were some efforts in this direction (specifically to support a “Swiss subnet”), but I have no idea what exactly they were planning to do or how far along they are.

1 Like

Hey everyone,

First of all, while I agree with some and disagree with things in this post, I wanted to comment two things:

  1. I think the tenor of the conversation is what I like in the developer forum. Reasonable people can disagree, ask for clarification, push back, etc… That is the nature of a good and healthy dialogue. I would argue that when i read something I disagree with, but respect, thats is the biggest sign of how tolerant and respectful the environment is. This after all, is a large computer science project, not a weird crypto cult :slight_smile: let’s keep up the tone!

  2. I (and others) have escalated this post to some folks at DFINITY who may have deeper insights. I expect more info to come. I and and others are actively reading and digesting what we see.

25 Likes

Thanks @diegop that’s all great to hear.

4 Likes

Agreed. However: Wordpress is one of those apps which probably wouldn’t suffer from these limits.
If there was one thing which would end the discussion about the future of ICP growth, I’d pick Wordpress integration. Many more people would look to the IC as a place to host their blog than will as a place to build the next Netflix.

Swarm’s time is coming. Right now, no one really cares how their NFT data is stored; and the UX experience needs to come a few country miles.

This and Fetch.ai are two projects which I think would totally change the IC.

1 Like

I like the idea of focusing on the application first and identifying 3 to 5 possible “killer” dapps is the way to go (and then look at the building blocks / technology). Hopefully 1 or 2 will really work out.

Interesting idea. So there would be an open source way to build a sire and start a canister that will basically contain the site. An II and a payment would be required to create and run the canister if I understand correctly. Where would the assets be stored (large images or videos)?

I have been talking with the founder of Arweave through DM, and I am going to post excerpts from our conversation with his permission.

Some of this conversation is in response to @PaulLiu’s posts here.

These will just be excerpts, I may remove or shorten some message responses, correct typos, and the order/flow might not be exactly how they were in the DMs:

Jordan: How are you going to secure these processes running on possibly untrusted CUs?

Sam: I think this is a misperception. The CUs will be staked, so if they provide you with a false state attestation they can be slashed. This gives you a security guarantee proportionate to their stake – if you would like more stake, you can just ask more CUs

Jordan: Your claims of smart contract-level verifiable compute seem strange without explaining or figuring it out yet, I’ve been talking with your team in ETH Denver

Sam: I think the core difference between the approach of ICP and ao is that ao has dissociated compute and consensus, while ICP bundles them

Sam: Having them separated lets you do interesting things with them – like let the compute be in other types of VM, run longer, require VM extensions, etc

Sam: It also gives you a ton more flexibility on the consensus side, too: Choice of SU (giving ordering and DA guarantees) is left to the developer. You can run a trusted one (like the PoA testnet ao currently runs), or you can use a staked one (as we plan for the base mainnet release), but you could also use Bitcoin, Arweave, EigenDA, or Celestia without changing the core data protocol

Sam: So to put it in the simplest sense: Your process is secured by the availability of its inputs, because verifiability of that (with any deterministic VM) gives you reproducibility of the state. You can then ask any CU – or even just run it yourself – to calculate your state.

Sam: The staked CUs actually give you far higher guarantees on reading the state vs traditional blockchains (I think this holds for ICP, too?). Typically, a user reads the state of the network by sending a HTTP request to a gateway/indexer/RCP. In ao when you ask a CU (also via HTTP) you get a signed response from the node. If you later find that this state attestation was invalid (either by asking another CU, or running it yourself), then you can slash the CUs stake.

Sam: You can also validate it yourself much faster, as execution is decoupled from consensus. Essentially each process is its own independently verifiable ‘blockchain’ – so you can calculate its state without generating the state of any other process. I believe if you tried to do this in ICP you would have to re-validate the entire subnet?

Jordan: I am referring to the idea that the network of CUs will be untrusted by default, as in they will be independent of the process owner, thus possibly Byzantine, and some kind of mechanism would need to be created on top to trust them. The staking is part of that mechanism it seems

Jordan: Yes but how does slashing work? Who can slash? How do you prove that a computation was done incorrectly without zk? It seems you would need some kind of consensus mechanism, ideally in real-time so that a computational result can be trusted without a lot of latency

Jordan: Sounds correct, now I wonder where the consensus on ao will come from, as it is necessary to provide security guarantees without zk…I think with zkWasm for example relying just on Arweave for consensus on inputs and outputs would be enough, correct?

Jordan: Having VM optionality and long-running computations is a definite improvement over ICP currently, still confused on how/where the consensus will come from though

Jordan: Consensus on inputs is crucial of course, but not sufficient (zkVM might change that story). I can’t just ask one CU to process something because I can’t trust that one CU, stake would help but I don’t think that’s sufficient for all use cases, some kind of consensus amongst multiple CUs sounds like what is needed

Jordan: Also running it myself is not going to be feasible for building web-scale applications for example. I’m not sure what your ambitions with AO are, but on ICP we’re building BFT web servers, databases, full-on web backends. For example we have Express.js running on ICP, you essentially deploy a BFT 13-40x replicated JS server. We have basic SQLite and hopefully soon Postgres compiled to Wasm running (PGLite). You can’t run this stuff yourself to verify it

Jordan: ICP doesn’t have automated slashing like that, but nodes can be removed by the DAO, and subnets can provide signatures on all outputs (I believe they do actually for all state change requests, not for read-only requests by default, but they can do that to). The exact verifiability I don’t remember, as in where exactly these signatures are verified

Jordan: It’s just completely impractical to expect a user/dev to verify a web-scale backend with potentially gigabytes of data, this doesn’t scale of make sense for many use cases.

Jordan: We’re going for full-blown web-scale Ghz-level compute over gigabytes of data, you can’t run this stuff on a client

Jordan: Now again, having a zkVM at the core would change the story

Jordan: Sadly I am still left unsatisfied, I don’t see where the consensus on the processes is supposed to happen, well I guess it’s supposed to happen outside of the system? But then someone needs to build this

Jordan: As a dev I want to build full web backends on AO, I want http, Postgres, etc

Jordan: I want to just run the process and have bounds under which it is provably secure, provably BFT

Jordan: I don’t see that here, I see some building blocks maybe

Jordan: For example I want an algorithm/protocol here that says as long as 2/3 CUs come to the same output with the same inputs then I am guaranteed honesty or something like that

Jordan: These are essentially the guarantees ICP, Ethereum, Bitcoin, etc give us, and these are all networks potentially full of Byzantines

Jordan: Byzantines are processes that can fail for arbitrary reasons including dishonesty, which is exactly what a CU is even if staked

Jordan: The stake just helps weed out Byzantines, but I think more than that is required

Jordan: In the end: where is the 2/3 etc real-time BFT guarantee coming from?

Sam: Ok, I think we are getting to the crux here. There are three layers:

  • CUs: You can ask as many CUs as you want in order to gain as much certainty (backed by stake) as you desire. Because ordering+DA is separated from execution, you do not need an explicit consensus mechanism _at the CU level. For example, if you wanted to have the equivalent of 100% stake (on a PoS execution network) you could simply ask and pay every CU to do the computation. If they all come to the same result, you have your ‘consensus’. This is flexible to reaching 2/3, 1/3, or just a USD token equivalent quantity of stake.
  • Slashing: What if they don’t agree? You can raise a vote on the staking process, calling others to calculate the real state and vote. This is a form of BFT consensus, but only executed when necessary. The core staking process itself will use Arweave’s BFT consensus as the SU to ensure availability and ordering, without dependence on any specific node. The staking process will also allow subledgers that grant for faster votes on low-confirmation time BFT ledgers, or even simply staked SUs – but participants can always default back to the Arweave network if problems occur on these subledgers.
  • Finally, everything rolls up to availability on top of Arweave’s consensus.

You are totally right that zkWASM will improve this. You could use any untrusted CU to grab your state.

Jordan: So can I achieve this: spin up a process on 31 CUs, for every incoming message have the client/user check that at least 21/31 agree?

Jordan: How will a user/client be able to verify that the computation was performed correctly without a lot of latency?

Jordan: I’m imagining a client using HTTP, they’ll want to perform an HTTP call to a server running in a process, and in the response they’ll want to know somehow that 21/31 agreed

Jordan: Is this possible with low latency?

Jordan: And another question, do you foresee the kinds of use cases I’ve been describing as being plausible with AO? Web servers, databases, frontends, full on web scale backend applications?

Jordan: * and by low latency I mean ideally under one second, but in the worst case a few seconds would maybe be acceptable, maybe

Sam: Yep! Or any other configuration. You could just ask any number in parallel and aggregate the results (or the first x% to reply, etc).

Sam: You could build an aggregator like this, or you could just have the client speak to many nodes in parallel. Both would work, although an aggregator doesn’t exist right now

Sam: These results you collate are a big step up on traditional RPC nodes, too, because they are staked against. If the CU lied you could take your result and have the node be slashed – regardless of whether ‘you’ is a web browser, another MU, etc.

Sam: Definitely decentralized databases and backends!

Jordan: Awesome!

Jordan: But what about latency?

Jordan: I’m thinking there will be some significant latencies on the order of seconds at least??

Sam: I think the claims of servicing HTTP requests ‘directly’ from ICP canisters are unfortunately relatively weak, because there is always a ‘gateway’ node that has to sit between the user’s browser and network. The question is how the gateway will be incentivized (given it is transferring all of the bandwidth) and trusted (given lack of good browser options for validation). I think you could do exactly the same thing with ao (with staked results in the browser, too!), but I don’t think it would be a high integrity claim we would be happy standing by to call it ‘serving’ HTTP requests from the processes. It can definitely pre-process the results, though! Is there anything I am missing on the ICP side here?

Sam: It will definitely depend on the way you write your process, but in the mode we expect to be the default (staked independent MUs, SUs, and CUs), it should be sub-1s latency. In the current setup we get roundtrip latency of ~500-800ms.

Sam: If you want to move 1b USD in a process, though, the message recipient may want to wait a period of type with the message in the buffer (even if it is received with very high stake)

Sam: Again, the approach is to allow flexibility and modularity, rather than a monolithic design where one is not needed

Jordan: There is some nuance here. ICP has a system of boundary nodes that accept plain HTTP requests and convert them into API requests that are then sent (over HTTP still I believe) to the replicas, where they are then gossipped or otherwise communicated among themselves

Jordan: So yes, there is essentially a proxy layer, the current plan is to allow anyone to run these proxies into the network

Jordan: But you can also call the API nodes directly over HTTP…I’m not sure if you can call the replica nodes themselves directly. We are doing this to get around some authentication issues right now, where in the browser we override global fetch and intercept the dev’s requests only if a certain HTTP header is present. We then use a client-side “agent” to perform the API calls, bypassing the translation of raw HTTP requests into the API call requests

Jordan: Calling the claims weak might be warranted, but practically speaking the developer can write Express or other http servers in their canisters

Jordan: There are of course limitations, but the developer experience is getting better. Here’s an example of a very simple Express application that you can deploy to ICP: [https://github.com/demergent-labs/azle/blob/main/examples/hello_world/src/backend/index.ts](https://github.com/demergent-labs/azle/blob/main/examples/hello_world/src/backend/index.ts](https://t.co/ACgZRPo2yP))

Jordan: There will always be infrastructure required to receive and translate HTTP requests, resolve DNS, and deal with networking that might be outside of the guarantees of the blockchain…perhaps?

Jordan: But in the short-term only the HTTP Gateways themselves, the ones that convert raw HTTP requests into HTTP API requests, will be outside of the ICP network’s incentivization mechanisms

Jordan: And when I talk about serving the HTTP requests, I think I’m mostly interested in as a developer being able to write an HTTP server using normal web development libraries in my process, and just deploying that to the network and having it work

Jordan: Obviously with good-enough security guarantees, but I believe ICP has pretty good guarantees on that and will be improving

Jordan: If AO can provide a similar experience, that’s what I’m most interested in knowing

Jordan: It sounds like you’re saying that it can

Jordan: If latency, cost, scalability, and developer experience are good enough, then that’s amazing

Sam: Yep – the ‘pre-processing’ (generating the HTML etc to serve) is definitely possible, and cool that it can be validated. Serving web apps is an entirely different game though. In the Arweave ecosystem we have the

@ar_io_network

which is building decentralized gateway infrastructure, but getting the incentives right is highly non-trivial.

Sam: Yeah – it is definitely an exciting idea. I don’t see any reason it shouldn’t work on ao, with some of the same caveats as ICP. Will probably take a little while for the compilation tooling to get there though

Sam: This conversation has actually got me thinking much more about whether CUs could essentially be gateways themselves?

Sam: You pay them for access, which solves the incentive problem, and their responses must be signed (we have a system called P3 for adding crypto auth in HTTP headers: https://arweave.net/UoDCeYYmamvnc0mrElUxr5rMKUYRaujo9nmci206WjQ…).

Sam: Even if the CU thinks that the client it is talking to is a web browser that won’t validate the signature in the headers (or re-validate the execution from other CUs), the risk of being slashed is so high that there is a very strong incentive not to lie about the results still

Sam: So in theory you could ‘serve’ the site directly from the CU, with some strong guarantees. And of course, you could swap out the CU just like you can swap out an Arweave gateway (ex. [http://sam.arweave.dev](http://sam.arweave.dev), [http://sam.g8way.io](http://sam.g8way.io), [http://sam.ar-io.dev](http://sam.ar-io.dev), etc)

Sam: Can’t imagine how you could subsidize the CU-as-gateway, though.

Jordan: My biggest concern with AO is still verifiability, and his takes validate my suspicions that AO may be lacking there

Jordan: Being proven wrong would be very interesting of course

Sam: …If so desired, many CUs in ao can sign the same state to attest to its validation. This is essentially the same as providing the chain signatures that ICP has

Sam: number of signatures from different staked CUs can give you far better verifiability than a BLS sig from many different unstaked nodes. Lack of stake means that these subnets are highly vulnerable to ‘eclipse’ style attacks. In ICP these can affect not only state attestations to users (which, I believe, are even relayed through unsigned and unstaked gateways?) but also the passage of messages between subnets.

Sam: Somewhere in the message they also mention that no intermediate states are stored. This is again incorrect – nodes in ao already store staked snapshots on Arweave that other nodes can validate and also resume execution from. Again, joining an ICP subnet from a BLS signature of state at a block height is far riskier, as nothing at all prevents an attacker from slowly eclipsing the subnet and signing any data they like with the trusted keys.

Sam: A final point to note is that in general the entire ao architecture is flexible at its core. Processes are free to choose any requirements that they want in terms of security – they could always trust a single set of keys (PoA), require a variable stake on the messages they receive, or even delay processing for riskier messages for a period of time (the delay the author mentions in rollups). You could even implement the precise security mechanisms of ICP if so desired. Instead of being prescriptive and enforcing one model on all users+devs, ao let’s you choose what makes sense for each individual use case.

Jordan: ust to check on something, CUs have flexible verifiability of messages between processes right? Process A → Process B, Process B could require X signatures from specific CUs before accepting the message?

Sam: I wouldn’t call it flexible verification of messages at the CU level (they cryptographically attest to faithfully executing the code of the process), but yes, the processes can flexibly verify the messages as they like. Process B can require any number of signatures, optionally attached to any amount of stake, or require a ZK witness of correct execution if they prefer.

Sam: Or even only accept messages from a certain set of other processes, or require a signature from a specific key

Sam: Have been thinking that it might be interesting to demonstrate what an ICP-style security model implemented in ao would look like. It’s actually far simpler than the PoS system that we expect most processes will employ

Sam: I also think that a reasonable way to sum it up is inline with afat’s diagram: https://us1.discourse-cdn.com/flex023/uploads/dfn/original/3X/4/c/4cf1cb80b97d881fbd5d6b515bd767ea6bd463d6.jpeg

Sam: I thought it was quite neat

Sam: The exact placement of the dots is hard to follow (why is EVM on IC further to the right than EVM on ao, when his point is that IC is essentially ‘EVM compatible’ – you could run EVM on ao+IC), but the idea to visually show that ao contains a flexible plane of different trade-offs is cool

Sam: I think that afat is right that you could run a good ‘permissioned decentralization’ SU for ao on an IC subnet. This would be great! I remain to be convinced about the trust model for IC subnets, but this would be a cool option to give people if they want it.

Sam: For CUs, however, IC would be very limited relative to ‘normal’ staked ao CUs. Having consensus at the state layer (rather than inputs) puts fundamental restrictions on what they can achieve (lower execution counts, no VM choice, VM extensions, etc)

Sam: The subnet-consensus-on-execution approach even limits you from achieving greater trust on your execution outputs if you so desire: In ao, you could ask a full ‘subnet’s worth’ of CUs about the state, or you could ask every CU about the state – or just a couple. Whatever your preference is

Sam: So for that reason, I don’t think IC CUs would be competitive.

Sam: But for SUs, it could be very cool!

8 Likes

@diegop please don’t forget to escalate this latest post to those with deeper insights as well, thanks!

4 Likes

Jordan that is a very informative dialogue you had with Sam and thanks for sharing it here.
However I found it hard to follow the conversation as your questions and his replies seem to appear out-of-order.
Could you edit the post to restore the flow of question and response?
Cheers!

4 Likes

Since you guys dived into aocomputer already so much, I can probably ask my question here instead of having to look for an aocomputer forum. What is the criteria for the output of a process to be written to the AR layer?

I mean how many CUs, or what stake equivalent set of CUs, has to come to consensus that the data layer records the output of a computation, which could be a message to another process. Who chooses the CUs? They have to be chosen by the process owner?

3 Likes

Thanks for talking to Sam! Getting his opinions is very helpful to understand many aspects of AO. But I still hope he could clarify on how optimistic challenge would work in an async setting. Suppose the following messaging sequence between processes:

A -> B -> C -> D -> E

Everyone’s input & output is recorded on the AR chain. Now suppose the output of A (which would include message A->B) is being challenged and eventually proved wrong, so CUs for A would have their stake slashed. But what happens to CUs for B/C/D/E? They merely took messages that was recorded on AR, and did their computation accordingly. Do they get slashed or not? Maybe E already launched a missile or something that can’t be easily reversed. How is slashing CUs of A going to help?

Chains like Arbitrum would give 7 days if anyone wants to challenge A, and if nothing happens in the 7 days would it then assume the message A → B is correct, and meanwhile everybody from B onwards would have to wait.

AO wants all of the processes to keep churning until something wrong is spotted.

You may want to argue that if E is capable of launching a missile, it should require 100 CUs of D. By this logic D would require 100 CUs of C, and so on. Is this really how AO works?

10 Likes

I’m pretty sure AR is the source of truth here and CUs for B-E did nothing wrong in their model. Hence my question what condition gates writing output to AR.

5 Likes

No, consensus is not required. Each CU would record its own computation output on AR. That is it. It is up to the downstream CUs to decide how many outputs from upstream they want to see.

Yes. But I can also imagine CUs taking payments from whoever is interested in keeping a process live.

1 Like

So multiple outputs can be written to AR for the same computation?

1 Like