Hashed Block Payloads

This thread is to discuss a very nascent idea to significantly increase the data and computation throughput of ICP. This idea may help to resolve concerns around instruction and message size limits.

Instruction and message size limits are two of the crucial protocol weaknesses pointed out in this thread: Let's solve these crucial protocol weaknesses

These weaknesses preclude many use cases from being implemented on ICP. A solution would hopefully be a great boon to the protocol.

Discussion was started in this thread at this comment: Let's solve these crucial protocol weaknesses - #44 by free

7 Likes

Further context can be found in the following comments, we will continue the discussion in this thread:

Some have voiced concerns that this would make ICP less “on-chain”. @free is very concerned with the technical feasibility, and there are many unknowns and some unsolved problems.

My last comment was introducing the idea of data availability sampling as found in the Ethereum ecosystem (EIP-4844, dank sharding, EigenDA, Celestia) as a potential solution to allowing all replicas in a subnet to know that a large amount of data is available…something like that.

Let’s continue to discuss.

1 Like

Hi Jordan
You’ll be happy to hear that we’ve started working on this a short while ago. We’re currently designing an approach that increases the throughput by putting references to large ingress messages instead of the full ingress message without introducing security holes. Needless to say, we want to provide value quickly with as simple an implementation as possible. As a first step we aim to use the same message/block limits, yet this should increase throughput significantly already, and then later explore allowing larger messages. We’ll report on our approach when we’ve made a bit more progress and have something to present.

15 Likes

I am happy to hear this! Very exciting and good luck!

To clarify, will this allow increasing the instruction limit in addition to the message size limit? Or is it focused only on the message size limit?

And will the incoming and outgoing payload of a message be increased?

Using references instead of the full messages can be seen as a basis for explorations into increasing some bounds.

Increasing the instruction limit is an orthogonal problem, imho (and I’m definitely not an expert on that matter), in particular I don’t think anyone would like to have longer rounds…

In general, the limits for ingress messages and their results as well as many other bounds have dependencies in the upper layers, since they influence what needs to be stored, execution time etc. Therefore it will require careful investigations to determine the trade-offs involved and find the right thresholds. At this point in time I cannot promise when and what improvements are ahead, but rest assured, many people at DFINITY care about and work on such restrictions.

3 Likes

I just replied wrt instructions limit in the other thread.

2 Likes

The approach I was suggesting for longer computation does not involve increasing the instruction limit. To some extent it’s also orthogonal to putting references instead of payloads into blocks.

Similar to HTTP outcalls, every replica is instructed by a canister via the management canister to do some long-running computation independently, outside of the Deterministic State Machine; assuming said computation is deterministic (which may require using canister code as opposed to arbitrary third-party binaries); the replicas reach consensus on the result and include it into a block.

Where references come in is that they would allow for arbitrarily large results (e.g. training an AI model). Although based on your earlier message (and my own idle thoughts on the matter) enforcing the same payload limits on references is a lot easier to achieve. (My own line of thinking on the matter was that references could be used under ideal conditions, e.g. when all replicas are healthy, making progress and have the payload; and you could always fall back on including the payloads into blocks when conditions are less than ideal. Which would make it harder to deal with larger payloads.)

6 Likes

@free I thought your idea would also increase instruction limits as you (seemingly) allude to in these quotes.

But now in this thread it seems like that’s not what would happen? Can you help me to understand?

Increasing the instructions limits and being able to run arbitrarily heavy and long computations are related but not exactly the same thing.

Increasing instruction limits would mean that you can run longer computations inside canisters. The idea brought up in this thread is more tailored towards other forms of computation outside of canisters (e.g. running a workload on a GPU) and once that computation is done and agreed upon (which can happen on the side while the replica keeps executing other (canister) messages) we can return the result somehow back to your canister.

For reasons that have been brought up in the other thread (and previously in the past), increasing instruction limits beyond the checkpoint boundaries is not an option in the foreseeable future (at least not an easy one that we can even imagine how it could be pulled off).

4 Likes

I’m making an assumption that I’d like to verify. If we pass an ingress message via ref, the canister doing
the execution still has to load the message into memory and each additional byte uses some number of instructions. Is this correct? So there would still be some maximum dictated by the max execution limit?

I think the analogy with the HTTP outcalls is really on point (if I understand correctly). Basically, in the same way that a canister “instructs” the replicas to send a request to a web2 API and then agree on the response, a canister could instruct other “worker nodes” to execute arbitrary (deterministic) logic and then agree on the result.

I’m wondering if these worker nodes should be part of the IC (in the sense of sitting next to the replicas and form some kind of node on steroids) or if they could be part of other networks that are specialized on particular tasks.

In the latter case the IC could leverage existing (and coming soon) networks that are focused on AI training, storage, long running computation or whatever else. Is this in conflict with the 100% on-chain principle? I guess it would still be fine if this “off-chain” computation is run on multiple workers and the result agreed upon by the canister. The assumption would still be that a certain amount of the workers is honest. Any thoughts on this?

3 Likes

If you wanted to delegate work to other machines, there would be no need for said machines to be replicated at all or use consensus. I.e. you would be doing something very close to a regular HTTP ouitcall. When the response materialized, you could reach consensus on it either because it’s e.g. a block signed by some other blockchain or possibly even based on something as simple as an SSL certificate of someone you trust.

If you want the kind of replication and trust model that a subnet provides, then the work would have to be done by the nodes running the replicas themselves. Could be something as simple as a very long running computation (minutes, an hour); or you could have a subnet made up of nodes with GPUs and each of them could train an AI model independently and then (assuming the training is deterministic) agree on the result (same as they agree on the response after an HTTP outcall).

4 Likes

Why running a computation on multiple servers (that are not replicas) and then agreeing on the result is different from running it on the replica nodes?

I see three options to extend the computational capabilities of the IC (with GPUs, extra storage, or whatever other resources needed):

  • replicas become “supernodes”
  • specialized subnets
  • deep integration with other specialized networks

The third option would enable the IC to take advantage of all the progress of the rest of the ecosystem. While I agree that the IC is years ahead from many points of view it’s difficult to believe that it will always be best in everything.

So why can’t we delegate some specific work to other machines (that might do it “better”) while maintaining a “similar” trust model to the one of the subnets?

2 Likes

Because it’s a different trust model: would you trust Bitcoin or Ethereum if it ran on 13 anonymous nodes? Also, if this is a replicated computation outside of the IC, how can one even tell that it’s replicated (as opposed to e.g. one machine with 13 IP addresses)? So it would be very much an HTTP outcall: send a request to some machine (single or part of some other blockchain network); wait for a response; induct the response onto the subnet because you trust some signature on the response.

The machine you sent the request to may do all the work itself or be part of some consensus protocol that’s unrelated to the IC, but there is no way for the IC to tell what actually happened. You, as the author of the canister having made the HTTP outcall may choose to trust the response (e.g. because it’s an Ethereum block or whatnot), but the IC itself has no way of deciding whether it should trust an arbitrary response from an arbitrary machine / network.

4 Likes

Yes I agree, it might be as simple as sending an HTTP outcall to another network.

I’m not suggesting to run the computation on n random servers (that might actually be a single one, as you said).
But to run it on another network which, even though works differently from the IC and has a different trust model, provides some desired guarantees (as long as the assumptions of its model hold).

A canister integrating with such a network, in order to operate as expected, would not only depend on > 2/3 of the IC replicas being honest, but also on the assumptions of the other network being valid. But I don’t think this is a deal breaker. Indeed, even if the trust model of Bitcoin/Ethereum is different from the one of the IC, it doesn’t mean we shouldn’t integrate with them.

So as long as another network provides a service with some desired properties (under some reasonable assumptions), why shouldn’t we leverage it to complement the capabilities of the IC? Am I missing something?

Anyways, sorry for going off the topic of the thread…

1 Like

I’m not saying we should not integrate with other networks. Just that that would very literally be an HTTP outcall. Something you can already do today (as long as you stay under a 2 MB payload limit). It can be tweaked / optimized to make a single outgoing request instead of 13; and accept a single response (instead of 13) that is verified using some other method (e.g. a signature from the other network), but that won’t change the fact it’s an HTTP outcall as opposed to an arbitrarily long computation running on the IC (canister code or something fancier).

4 Likes

Consensus already has a system of alternate block proposal from different block proposers, ranked randomly into “slots”, slot 0 being the highest priority block maker, slot 1 the first backup, etc.

Maybe this existing system can be leveraged to mitigate the problem you describe. For example, say only the slot 0 block maker is allowed to include blobs into its proposal, slot 1 and higher only make traditional block proposals without any blobs, where “blob” means data included by reference. And notaries ignore block proposals for which they don’t have all blobs (i.e. treat them as if the block proposal hadn’t arrived). That way the chain would continue in your scenario working with slot 1 or higher blocks until enough notaries have the blobs that appear in a slot 0 block.

In this way, our consensus proofs guarantee liveness.

It can of course happen that the chain continues with 2/3 of nodes and 1/3 never catches up but that has always been the case, blobs or not.

3 Likes

I believe we did discuss this option too. The downsides are that (1) given enough blobs it’s unlikely for the rank 0 block to ever be accepted; (2) each payload must still fit into the block (or risk stalling indefinitely) and (3) it’s arguable whether one can describe it as graceful degradation (if canister developers start building apps with the expectation of tens to hundreds of MB/s, falling back on 4 MB/s is a pretty serious hit).

Also, if you do end up with 1/3 of the replicas behind (and the other 2/3 using significantly more than 4 MB/s just because they can) it may be quite hard to catch up.

I’m not dissing your idea, it’s quite good. But I do think it needs quite a bit of refinement to be useful in most practical scenarios.

2 Likes

I don’t agree, as soon as the IC start relaying on third parties to something that important like computation and storage WHERE’S THE ADD VALUE OF THE IC!

Really guys I’m not sure if you doesn’t care or don’t see the things in the marketing perspective and add value, many people in the community and investors are here because of the ON CHAIN ideology, and now because few people are pushing that they want web2 computing power immediately on the IC we are going to pivot and not improve but start going backwards?

I really hope dfinity team members are aligned with dominic Williams and his vision and do the impossible doesn’t matter what, but achieve it, I don’t want you Dfinity team members be pushed so hard by some (really small) Group of developers pushing you to do things to “solve” some limitations as soon as possible, take your time but the revolutionary product is the one who wins in the end, in we are going to do the same like every other decentralized computing project, We will felt on the same line and we will compete with too many projects.

“It is better to have a large part of a small market with growth potential ( crypto cloud, on chain storage, on chain computing, on chain zk proof verification) than enter on a market that is initially larger ( off chain compute, off chain decentralized storage) but with too much competition”
Sam Altman

2 Likes