Let's solve these crucial protocol weaknesses

I appreciate this discussion, as it brings transparency and fosters trust among users and investors.

Having followed his dilemma, however, I think @lastmjs brings up a valid but uncomfortable point: that what the IC publicly promises sometimes differs from what it can physically deliver. I would hate to see the team’s amazing efforts undermined by a marketing campaign that sets up it’s own straw men.

4 Likes

Yes, many CUs can collectively run one process. But the primary model of AO is that a user can request N>=1 CUs from a market to run a process (and the user will pay for it). This replication factor N has to be set (and agreed upon) so that any downstream receivers of a process will know not to wait on more than N messages. Therefore, in the situation when A receives message from B, A cannot retrospectively ask for more CUs to re-compute for process B (I was explicitly told this answer, as it would violate the importation assumption of AO: CUs should take inputs from AR as is). Also due to dependencies, it is unclear how much security A would get by insisting on its upstream B to have 100 CUs, while B’s upstream only has one or 2 CUs.

When you “compose” security (for the lack of a better word), you don’t get the sum or the maximum, you get the minimum.

PS: another goal of IC to is run autonomous internet services. I’m sure that you are aware that most other blockchains would expect the team behind of a project to run their own service. Off-chain computation protocols like Ordinal would fully expect this if a team wants the maximum security. AO is no different in this regard. So you can’t really expect autonomous services on AO, because the protocol does not require (and cannot force that) a process always has at least one CU assigned. Your smart contract may get abandoned if you don’t at least put in one CU yourself. You see, being flexible has its cons too.

9 Likes

Not all algorithms or libraries can be easily chunked like this. If you pip or npm install a package and it doesn’t provide for chunking, imagine being that developer.

We can of course chunk things and use the timers features to even do it automatically in the background, but it’s a lot of work and for some devs and libraries they will just hit a practical dead end for their skillset.

Unless you’re saying that somehow the IC could pause a computation and initiate another update call in a timer automatically or something?

Could this be the case? Could the host replica just decide to interrupt a canister method during execution? I imagine this might have undefined behavior around global state as other messages come passing through possibly updating the state.

1 Like

As a bit of food for thought, here’s a brief overview of an idea we (in and around the Message Routing team) have been throwing around to significantly increase (data and computation) throughput on the IC. It is just an idea, with a couple of big question marks and probably a bunch of pitfalls we aren’t even aware of quite yet. And if it does turn out to be feasible, it will require huge effort to implement.

Finally, we’re not claiming it is entirely our idea and that no one else has had more or less the same thoughts. These kinds of things grow organically and directly or indirectly influence each other.

The basic idea is nothing revolutionary and it has even been brought up a few times in this thread: don’t put the full payload into blocks, only its hash; and use some side channel for the data itself. That side-channel could be IPFS running within / next to the replica; the replica’s own P2P layer; some computation running alongside the replica (e.g. training an AI model on a GPU); and so on and so forth. Once “enough” replicas have the piece of data, you can just include its hash / content ID in the block, the block is validated by said majority of replicas (essentially agreeing that they have the actual data) and off you go.

You could use this to upload huge chunks of data onto a subnet; transfer large chunks of data across XNet streams; run arbitrarily heavy and long (but deterministic) computations alongside the subnet; and probably a lot more that we haven’t even considered.

The obvious issue here is that combining many such hashes into a block (or blockchain) could make it so that not enough replicas have all of them. E.g. imagine a 13 replica subnet; and a block containing 13 hashes for 13 pieces of data; each replica has 12 pieces of data and each replica is missing a different one. So going about this naively (and e.g. including all 13 hashes in the block) you’ll end up stalling the subnet until at least 9 of the replicas have collected all the data needed to proceed. This may easily be minutes of latency before the block can even start execution. Even being exceedingly cautious and saying “at least 11 of the 13 replicas must have all data” can lead to the other 2 replicas constantly state syncing, never able to catch up. And what if 3 replicas are actually down? I’m sure there’s a way forward (again, implying some trade-off or other – liveness, latency, safety), I just haven’t found it. The best I have so far is to use this as an optimization: if “enough” replicas have all the data, they propose and validate a block containing just hashes; if not, they continue including the actual data into blocks and advance with much reduced throughput (as now).

As said before, if you have ideas on how to make this work; or completely different ideas of your own that you want to talk about; please start a separate thread. I’ll be happy to chat. Just keep in mind that there’s a long way from idea to working implementation; and that there’s always – always – a trade-off and a price to pay: sometimes it’s worth it, more often than not it’s not.

11 Likes

This is the kind of radical idea I’ve been hoping for. The benefits would be enormous, solving for the instruction and memory limit it seems.

I would love to participate in a discussion about this in another thread. Would you like to make it @free? I am also happy to.

5 Likes

Hi, my first time doing part of this forum, I always read, but this time I feel that I have to comment a few things.

First off, I want to make it clear that I don’t have an advanced technical background; I’m more of an investor and entrepreneur aiming to create some startups, leveraging unique business models made possible by the promised technology of Blockchain on the IC and current infrastructure. I’m someone who’s passionate about reading and staying informed on cryptography, and I try to study all the technical aspects of the Internet Computer (IC) to at least have a basic but notable understanding.

With that said, I apologize in advance if what I’m about to say doesn’t make sense, and I hope you correct me by explaining in more detail.

As I understand it, you propose to manage data on the IC more efficiently by, instead of including the data (payload) inside a block (making it on-chain) - and as I understand, the limit is 3 MB per block -, now processing it off-chain and storing it on IPFS, where only the hashes would be included in the blocks as such.

If I’ve understood correctly, this would fundamentally affect the added value of the IC, as where does the ownership of your data stand? Wouldn’t this make IC the same as Flux, Akash, and all those other projects that lack on-chain computing or storage capabilities? I’ve also researched these projects before betting on the IC, and my perception is that they do the same for data processing, using hashes and storing data off-chain. Wouldn’t we be doing the same as those projects we criticize, where they store their NFTs on IPFS linked by a hash?

If this is the idea or solution for this current limitation, I want to express my discontent, as this would go against the spirit and vision of the ICP and what all of us, both investors and entrepreneurs, (maybe developers?) aim to achieve: a vision where all web 2.0 software has to be rewritten and adapted to the IC, where all data must be processed on-chain. This is what we are selling to the public. Therefore, I believe that from the community and investors’ perspective, we would not support such an idea.

Although I understand it may require a lot of resources and R&D to find a definitive solution, I think the right path is to continue revolutionizing the industry, not just innovating. We should always be at the forefront with cutting-edge technology.

I don’t think rushing to take the easier paths, as the competition has, will turn us into the crypto cloud we all dream of. You mention it could be an optimization, but I’m not sure it’s healthy to take the quick and “easy” path, even when the solution fail to meet the promised expectations of a completely on-chain world.

9 Likes

Thank you for this great thread!

For instruction limits, the query charging proposal as linked to by @jeshli (Community Consideration: Explore Query Charging - #31 by icpp) seems like an interesting next step to explore from my perspective.

For memory limits, I’m hoping that Wasm64 will be effective. I think it’d be great to understand the exact effects it’ll have in more detail though.

Exploring dedicated “High performance computing” (likely also with GPUs, e.g. for DeAI applications) and “storage” subnets seems like a good idea and would enhance the IC’s toolkit significantly.

While subnets are a great architecture choice for a scalable, decentralized cloud like the IC, I like the idea of abstracting these away in the dev experience such that the “dfx deploy” demand becomes “smarter” (and allocates resources as best fit) and the canister orchestration is adapted automatically based on usage/workloads (to balance canister demands with subnet supplies). The dev would then only specify parameters as deployment requirements and preferences (or stick to the default smart handling by dfx deploy and the network).

5 Likes

This is certainly a highlight in this great thread, thank you for it!

And I think your post would make for two great articles (“How to verify” and how AO approaches this), plus probably a third one about how the IC implements this :slight_smile:

7 Likes

This radical idea, as far as I understand, would take away the one critical differentator that ICP has against everyone else. All on-chain for ICP is basically the same as 21M for Bitcoin. Offloading computation to 3rd parties will take away our fundamental lead that nobody else is even trying to solve. Id rather move slow and stick to the fundamentals. There should be no intermediary in the protocol stack.

5 Likes

The data that made it onto the replicas via these side-channels would still be technically part of the block. A block with only a hash and no actual data would be more or less useless. So for the purposes of gossiping the block around and achieving consensus, the block would still have a 4 MB limit. But for the upper layers (or if you wanted to back up and preserve the full blockchain) you would have to include these additional pieces of content.

The reason why there’s a relatively minuscule limit on block size is that the block must be gossiped to all (well, most) replicas on the subnet in order to achieve consensus. And eventually to all of them in order to be executed. That means reliably sending the full block across the Atlantic and Pacific in well under one second. But if there was some way to know that all (or most) replicas already have some piece of content, all they would have to agree on is its “content ID” (i.e. hash). So there would be less data to gossip in this tiny time window. And the actual payload could have taken arbitrarily long to be uploaded to or generated by every replica independently, without holding back subnet progress.

So I fully agree with your concern and this would definitely not change anything in terms of safety / tamper resistance / trust.

10 Likes

Maybe, to some extent. But keep in mind that computation would still need to be deterministic. So it would still likely imply Wasm code compiled against the IC’s system API, rather than arbitrary code. Or you could easily run the risk of running some computation for hours on end across all replicas on a subnet only to find out that they disagree (which I understand is a common issue with e.g. HTTP outcalls).

So IMHO this is definitely not a silver bullet.

Furthermore, this is a half-baked idea that we’ve been poking at every now and then. As said, I for one have no clue how one would decide what to include into a block, so you don’t end up with 2f+1 replicas making progress and f replicas constantly state syncing; only to then stall the entire subnet as soon as one of the 2f+1 replicas has any kind of issue.

So until we figure that one out, ensuring that it degrades gracefully (i.e. throughput is reduced progressively depending on the number of replicas at consensus height rather than everything working fine up to a point and then stalling for half an hour), it’s only an idea.

And once we’ve got everything figured out, we’d need to put a number of things in order (as said, persistence layer, messaging, overhead) and then spend probably up to a year putting together the pieces of the puzzle.

But sure, feel free to kick off the discussion.

4 Likes

I’m not sure it’s really “third parties”, though I suppose that could be the case.

The replicas will, if honest, obtain the data and hash its contents, and place those hashes into the blocks for consensus. The replicas will store that data if they want or discard it, they’ll always keep the state around hopefully as well…

The IC already throws away a lot of data relative to Bitcoin and Ethereum, and blocks aren’t even kept around forever is my understanding. So I don’t know if this is that much of a departure.

Also keep in mind, if you assume Ethereum as a good example of the ethos of decentralized blockchain networks (I do), it seems well integrated into their roadmap to begin throwing away large amounts of data after consensus. EIP-4844 in fact, tomorrow, will go live and begin to do this.

State expiry in the future will get rid of even more data. This is called “the purge” in the roadmap. This puts Ethereum much more in line with the design decisions ICP has already made.

So I’m not sure it’s correct to think that this change significantly makes things not “on-chain”. ICP already isn’t “on-chain” in many ways that other blockchains are. It’s more “on-node” or “on-replica”, and only the most recent agreed-upon info.

Someone help me out if I mischaracterized the technicals of ICP here, but I believe I am materially correct.

1 Like

I wonder if data availability sampling is the solution here, like Ethereum proposes in their full dank sharding, and like I believe EigenDA and Celestia use to provide data availability to blockchains.

I will start the thread and post back here.

1 Like

I have started a new thread to discuss Hashed Block Payloads: Hashed Block Payloads

3 Likes

I need to disagree here. ICP is the only blockchain that offers an end-to-end on-chain smart contract experience.

  • The UI is downloaded from the smart contract by the browser, instead of from a cloud service in every other blockchain.
  • Read interactions are done from the blockchain. Via certification you can have secure yet highly-efficient reads. Certified queries will provide efficient certified reads for any use case. Every other blockchain uses public cloud or other proprietary services like RPC nodes like Infura for reading data from the chain.
  • With HTTP outcalls, smart contracts can interact with Web2 without involving any other party. All other chains require oracles to connect to Web2.
  • ICP integrates with the Bitcoin network without any intermediary and uses t-ECDSA for trustless signing of transactions. Almost every other chain (ThorChain is one of very few notable exceptions) requires bridges to integrate with other networks.

ICP does discard history. This is at the core of the design of ICP because it intends to replace public cloud, meaning it needs much more throughput than other chains. Just to give you an intuition why it is infeasible to keep around the complete history, even just the ingress messages: Considering a subnet works at its maximum capacity with a 4MB block per second. This would give just a block history of around 126 TB. Per subnet. Per year.

To summarize, I think it’s really not fair saying that ICP is not “on-chain”. It just does not keep around all the (often useless) history. Canisters can implement their “own blockchain”, like ledgers do, in order to keep around history that’s relevant. So I’d claim that ICP is an architecture that allows for an excellent tradeoff between allowing smart contracts to keep things around, while discarding mostly useless history. In terms of doing things on chain, it is lightyears ahead of any other chain out there.

15 Likes

I suppose a definition of on-chain might help, as I love many of your points, but I feel I might be referring to properties that other communities would consider necessary to even be a blockchain.

For example on Bitcoin block data can be retrieved always and forever from full nodes (is my understanding), and this is currently the case on Ethereum as well (about to change tomorrow). Anything on-chain on Bitcoin is immutably written to that blockchain forever.

ICP works nothing like this and there is confusion around that, and indeed I think it is a major trade-off that ICP has taken, because individuals cannot independently verify anything (maybe anything is strong, you can verify consensus but you cannot detect collusion or incorrect state changes without observing their effects in a more difficult way than other blockchains provide) about the ICP blockchain.

You cannot download blocks and execute them and (similar to Ethereum) recreate states from the beginning. You have to trust that the nodes did not collude. You don’t need to trust that on Bitcoin or Ethereum, you can independently verify the entire history up to and including a transaction you’re about to participate in.

I feel saying ICP is completely on-chain makes it seem like it has all of the properties of the other blockchains and more, which it doesn’t.

It has chosen its set of trade-offs.

Then again, Ethereum is heading in many ways towards a similar set of trade-offs…or maybe some wouldn’t consider them trade-offs?

I may be mischaracterizing, happy to be wrong here…help me understand why it’s not a trade-off?

7 Likes

Also on-chain sounds like it has similar security to other blockchains like Bitcoin or Ethereum, and I am yet to be convinced that ICP has anywhere near the resilience of the other blockchains.

Is a 13 node blockchain really a blockchain? I mean the underlying data structures would technically say yes if there is a chain of blocks used, but I wonder if most people in the crypto space think of something much more decentralized, verifiable, and robust than ICP when they think on-chain.

And if the claim is that ICP is as secure as Ethereum because of “deterministic decentralization”, I want to see real, hard, formal research done to convince myself and others, so far it’s just a lot of opinions or statements that this is true or probably true without much to back up those opinions.

Why is 13 okay? Why not 5? Why not 40? Why not 100?

Where is the line at which we find sufficient decentralization for a use case?

We need to research this, not just assume it’s true…or at least, I wish someone would research this, especially considering all of ICP security is based around t < n / 3, which is great, but how big does n need to be? Surely security breaks down with a sufficiently low n.

3 Likes

This would be an interesting avenue to pursue. If the replica were smart enough to say, “look, you get 1_000_000 instructions this round and then we’re going to freeze your memory and give it back to you next round to continue” then maybe non-ground-up libraries become viable long-term. If there are 50_000_000 instructions available per round then this gets you 50 concurrent processes(I made these numbers up).

The big issue would be what memory you read in to date because if another process changes that then you’d need to trap. (This is the biggest thing that gets people on audits is they do an await and then don’t revalidate their variables when they get the response).

1 Like

I don’t think anyone care about everything being all on chain. In fact what I have seen people state is that it is on realistic and practically impossible as it just too inefficient to put all data on the blockchain.

3 Likes

It seems the big conundrum is between capacity vs latency. If you get more capacity you increase latency. If you decrease latency, you have to reduce capacity(as in the messaging stuff that @free has been proposing elsewhere to increase throughput on messaging).

It seems zk can fix some of this by shrinking proof time, but if transform 2 needs the output of state 1, you still have to get all that state to the prover, which seems to reintroduce latency(unless they are colocated?)

1 Like