Let's solve these crucial protocol weaknesses

skilesare · March 11, 2024, 11:24pm

This is a bit stale after Paul’s amazing post, but I’ll post it for the record:

This is a response to @lastmjs ’s thoughtful post on DFINITY ’s Internet Computer and the comparison with #ao. (https://x.com/lastmjs/status/1766471613594083348?s=20)

tldr: The IC is awesome, DFINITY has built something amazing, it interops with ao, lets go build.

(see below)

I’ve also spent the last week or so investigating ao to the extent that I wrote what is probably the first ANS-104 compliant data item signed with a trustless t-ecdsa key and broadcast from the IC. This is a precursor to being able to run the mu, su, and cu components of ao on the IC. (https://x.com/afat/status/1765820285700247707?s=20)

I’ve spent the last 3 years building on the IC for myself, Origyn(we build a super NFT for robust, permissionable, data-rich certificates of authenticity) and shepherding a number of devs through development on the IC through ICDevs.org. The 3 years before that I worked alongside and sometimes with DFINITY in anticipation of their launch with a focus on how the IC could be used to enable web3 for the enterprise. Now with @Pan_Industrial we’re making that a reality.

I’ve had the pleasure of watching @lastmjs do some amazing things on the IC including bringing TypeScript to the IC. In 2019, when I found out that javascript, the language of the Internet, despite all its flaws, was not going to be an initial option to build smart contracts on the IC I was a bit disheartened and felt a huge opportunity was being left on the table for initial adoption. Jordan has almost singlehandedly closed that gap. We do differ a bit on what we think the IC will ultimately be best for which you can read about more in this thread if that has interest to you. I just hit the instruction limit! - #9 by skilesare The tldr is that Jordan and I are hacking away at different approaches to the IC both of which I think are necessary for the IC to have a maximum surface area for success. This thread will push my agenda a bit unapologetically as I think the best way for people in the ecosystem to grow and find new ideas is to continually discuss and reevaluate our assumptions.

To the ao engineers that will read this and cuss me for my naive understanding, please correct where I’ve made poor assumptions or have a wrong understanding. I know what a feat it is to actually ship and you all should be incredibly proud of what you e accomplished up to this point.(Same for my friends at DFINITY where I’m still learning and not much of an expert on the actual replica.)

The first consideration to mention is that, at this point, ao is a protocol with a single reference implementation that has limited security guarantees. The IC is a functional system that has been securing $X B worth of value for almost 3 years. Currently, ao is aspirationaly sophisticated, but functionally inferior in almost every way. Not unexpected as it launched a week and a half ago. Set proper expectations if you want to evaluate the two. It will be enlightening to check back in three years and see where each is.

What is the best we can hope for if we are trying to get computers across the globe to reach consensus about calculations across previously agreed upon state? Likely it is available state(s1) + transform(t) + zero knowledge proof of correctness(zk) = state (s2). I am unfamiliar with any solution outside of this that could deliver distributed computation faster and with more finality. The crux of a debate around ao vs IC is that this optimal solution is inside the bounds of the declared ao protocol. It does not exist yet, and it is likely that if it did the zkWASM prover would force latency to a rate far above what a parallel ao implementation was that made the same trade off of the IC. That trade off is that, according to the proof of the IC, see Achieving Consensus on the Internet Computer | by DFINITY | The Internet Computer Review | Medium, we can assume correctness give 2/3 honest nodes. There are likely more optimal trade offs than the IC makes that can be found, but the selected trade off and others combinations all can exists under the ao protocol definition. Therefore it just really isn’t appropriate to compare all possible ao implementations to the one specific implementation(the IC’s selection on this graph).

There are other graphs with different axis that are just as important, but I think this frames the debate the best) I’ll propose a theory latter as to why I think the IC and Evm on IC might be able to shift right out of the ao space, but for now let’s assume that all these networks sit inside the possible ao protocol network space.

Now that we have a frame I’ll get back to Jordan’s comments.

My concern with the approach Jordan has taken is that his goal is an “unbounded scalable virtual machine.” I believe that baring a quantum revolution, I do not think this is achievable without making a concession somewhere along the continuum of latency, storage, liveness, or security. Trying to shove either of these systems into that standard will result in disappointment.

What the internet computer has achieved is an “unbounded and scalable strata for actor-based computation.” Ironically, this is also what ao seems to be claiming to be with the focus on the actor-based model of computation. Classic web2 does not fit into that actor paradigm. It has evolved into a microservices architecture where each service has unrestricted fast access to unbounded data stores. We just won’t get this on the IC or ao. Your actors will have to be selective about what data they compute over. You scale by replicating these actors across ‘subnets’ where each actor operates independently. This takes a fundamentally different data storage and query architecture than classic web2. It isn’t more efficient (and by definition likely can’t be) but it is the pattern the world uses to create anti-fragile, sovereign entities that burst into the diverse universe we have around us. The trade-off from moving from the fast, micro-service architecture backed by centralized monolithic data structures is a fundamental shift in ownership, the movement of value to the edges of the network, and disruption of centralizing entities that too often take advantage of their power to extract excessive value. Actors are slower from a latency perspective, but better from a parallelization perspective. An EVM has nice atomic properties, but it is a monolith and the person holding the keys to scheduling the transactions gets the same centralized kind of power(MEV extraction) that we’re pushing back against in web2.

The entire current L2 diaspora is much less about decentralization than it is about value extractors trying to convince you that you should let them order your transactions, extract to the MEV, and fund their investor’s kid’s trust fund. Actor networks do their own ordering, run their own markets, and keep their value at the edge of the network.

The belabored point here that I bring up often when Jordan pushes on the tech to try to deploy web2 tech to the IC is that, first I’m rooting for him because I too want many of those conveniences, but second, that the ability to do those web2 things should be viewed as a gateway drug and not an endgame. Because of the speed of light, the cost to produce computational proofs, and developer’s ability to always push a tech just a bit further than it was designed to support, no platform will ever out-compete web2 at web2. You just can’t get to consensus and a trustless state as fast as they can get proprietary state. As long as we try to make the IC about replacing existing things we don’t like with something on the IC with THE SAME ARCHITECTURE, we’re doomed to fail. The future for ao or the IC is all blue oceans and if we manage to replace some old web2 stuff with something they will be alien to how we see those services today. Twitter on the IC just isn’t going to be better than twitter at twittering. Ever. And graphQL on ao will never be as fast, performant, or cheap as graphql running on web2 infrastructure.

I’ll now go through the deficiencies of the IC he mentioned and discuss how many of those are features and not bugs for the long-term viability of not just the IC, but ao, and any system ultimately trying to bring web3 values to a broad audience. And that that is ok.

Instruction limit of a few seconds

This is an actor model issue and I don’t see how ao would handle this any differently than the IC. Because your function is s1 + t + (c | zk) = s2 your actors are generally single-threaded ( c being consensus). Maybe there is a multithreaded actor model out there, but if not, even with a single node where zk provides the correctness you have to process the messages in order so you can’t get to s4 without going through s1, s2, and s3 somewhere in the network. If a machine needs s5 it has to wait for someone to deliver the proof of s1, s2, and s3. If s2 takes 5 minutes, your actor isn’t doing anything else for 5 minutes. If you try to do something in s5 that s2 changed out from underneath you, your s5 will trap and your state won’t change(ie, at the end of s2 you send tokens so they aren’t there anymore when your s5 transaction gets there).

How do you counteract this? The same way databases do this kind of thing. You lock what you expect to have rights to change and process over many rounds. You can do this today with tools like ICRC-48 icrc48.mo/standard.md at main · PanIndustrial-Org/icrc48.mo · GitHub that split tasks across rounds so that other processes can be run(as long as they don’t run into your locks). ao doesn’t get the benefit of the doubt here that they’ll just be able to pull something out that makes them magically be able to parallelize many transactions. The spec specifies the transactions must be processed in the order assigned by the su. In fact, I think a cu that was trying to do this would have to issue itself messages routed through the mu/su to split this work into chunks. If these messages have to settle to arweave before they can be properly sequenced and consumed by the cu then it is likely fundamentally impossible for ao to outperform the IC without resorting to significantly superior hardware. Thus, with the su built into the IC via crypto math and the insistence on high machine specs, I predict that many IC applications exist outside the possible ai network space due to the internet latency of message passing. If the cu and the su are the same process and the cu looks ahead to see if it can continue uninterrupted, then maybe….but then you have a single node cu/su and I’m not sure how you give that any kind of security and trust guarantees(maybe zk?).

Unless the ao guys weigh in differently here, I don’t think it is as simple as “CUs can run whatever computations they are capable of running. If a process specifies that it needs a certain amount of computation, only CUs willing to perform that should pick up that process.” The CUs are still bound by the ordering of the su and must finish message 4 before moving to 5. Your scalable global computer is blocked on a thread until it finishes and can’t serve other requests. This is going to be new software built from the ground up whether you are using ao or the IC. Unfortunately, no free lunch here when trying to create a new Internet with web3 properties.

Message limit of 2/3 MiB
High latencies

These two issues are a speed of light issue when you want nodes across the globe to agree on consensus. Even with zk you’ll need to ship the proof and the state diff many miles which leads to latency. Maybe a pure zk solution where you only need one node to agree could get around this? But if you want your data as part of the proof, it takes time to process and distribute.

There are cryptographic schemes(file hash) that let you skip some of this and none of them are precluded from the IC. You can just as easily ship the IC an IPFS/arweve hash and then serve that file from a gateway to avoid uploading the whole thing. The key here is that if you do that, you can’t compute over the content. Ao will suffer from the same thing. Unless you give it a file hash in your message, the cu doesn’t have access to your file to process over it. According to the ao spec, if you do give it a file hash, it has to load in the file before it can process the message. I can’t imagine this is quick unless the cu keeps all possible files cached. When you upload a file to the IC you’re putting every byte through consensus. This is unnecessary for most cases. I love having the bytes in my canister because it opens things up for compostable third-party services down the road, but we aren’t quite there yet.

Most query latency on the IC is a function of certifying that the data coming out of the query was agreed upon by the smart contract at some point. This system is still evolving, but as far as I can tell, the ao paper makes no suggestion about how the cu is going to certify a response to a requestor. Each cu is going to have to roll its own way to tell a client that it isn’t lying. There will either be a signature of some kind, a slashing mechanism(which theoretically requires significant latency, on the order of days in the optimism world) before you can rely on the result, or a zk proof of calculation output by the cu which will add significant processing time.

The IC does need better patterns for serving data and information that doesn’t need this certification and agreement. But that is more of a UX and architecture issue than a failing of the IC. You’re getting a guarantee that the contract ran correctly and that all nodes agree that the query you just got back in <400ms is valid. If you don’t need that, route around it, but I’m not sure, if you do need that, how you get it with less latency. More processing power will drive it down a bit, but eventually, you run into the speed of light.

As an aside, the architectural solution for uploading large files is to upload each chunk in parallel to different subnets. This helps with serving the chunks as well if you use a client-side aggregator. The solution is UX + architecture but in a new, ground-up solution.

High costs

Arweave’s price for data uploads is significantly higher than the IC(ardrive quotes $3-8/GB, but I think I saw a tweet mentioning $30 the other day). You theoretically get 200 years for that and items under 100kB are free(most ao messages). Thankfully storage tends to follow a Moore’s law curve and this will decline over time. I know storage subnets and static storage are also being considered for the IC which may help bring the cost down. I guess it depends on your perspective with costs as $1GB to upload and $5/GB/year is so comically less than any other web3 platform out there that has compute over content that I’ve always considered it absurdly low cost. You probably don’t want to try to build the next youtube, but it is pretty cheap to host a few videos for a business.

Rigid network architecture

In 2019 I saw and had discussions with DFINITY that had plans for dial-up, dial-down subnets. With the movement on the Swiss-only subnet and discussions around AI on the IC I think there is only a matter of time before we see much more flexibility here. In the meantime, the 2 choices are likely an ergonomic and UX issue while the network is growing. I don’t think there is actually anything keeping you from compiling a 4-node subnet via the NNS if you can get the votes. With UTOPIA and the ‘bad lands’ discussions a couple of years ago, there is obviously flexibility available, but I wholeheartedly agree that a clearer path illuminated on how to get from there to there would be great, but I also recognize that is very hard to do when these markets change as much as they do.

I will note that while “It’s very permission-less and flexible, there is no need for a central authority like the NNS DAO,” Begs the question that if a DAO is not providing assurances that the network can meet the demands of users, then who is? The largest complaint in web3 has been useability due to wildly volatile gas/transaction fees. Heterogenous is great as long as the interface to the service consumer is reliable, consistent, and performs as expected. ao is going to run headfirst into this problem with every solution deployed and the IC has the answer of reverse gas already deployed.

Going back to our diagram, there are a number of architectures that ao could be built on and the IC is in that blue shaded area as one that would work inside the ao universe. In fact, I’m not betting on one emerging soon that works technically better than the IC for ao. That doesn’t mean that ao won’t become the standard because, as we’ve all seen, this industry is insanely memetic and arweave and ao seem to have pretty good market traction and are viewed in a pretty positive light. Basically, if ao does become the dominant world computer protocol, the IC will likely be the best way to deploy on that protocol because all the batteries to reach trustless execution of a mu, su, and cu are included.

Centralizing DAO governance

Jordan’s concerns about the NNS are valid at the present. I’ve thought a lot about how things are set up and I’m firmly in the ‘I trust DFINITY for now camp,’ but I see the need for eventual decentralization and understand that this currently is driving projects away from the platform.

Fortunately, this is a political issue and not a technical issue and one can currently draw a line from today to a universe where DFINITY does not have majority voting power on anything. I wish that pathway was better illuminated today, but I also recognize that the community is NOT ready to take on all the things that go on in the NNS. We should start getting ready now though. I think it will take at least one, possibly two other organizations as well funded as DFINITY to step up and become experts in the network topology, replica development(to the point of probably having replicas in other languages to diversify away code bugs), and have enough operating capital to have full-time employees focused on these things. We seem a way from these entities emerging. Enough tvl and enough value delivery and they will emerge. This leads me out of the IC’s deficiencies and into its advantages.

One of my biggest disappointments and frustrations over the last three years is how much of this debate has ended up being memetic. Of course I should have known this, but it still surprised me. Attention economics end up being as important as tokenomics. For whatever reasons and with hindsight we can see that the memes the IC launched with didn’t hit the market. It probably isn’t helpful to look backwards too much.

We are extraordinarily fortunate that DFINITY was funded to the extent it was because despite memetic whiffs(which are damn hard to get right) on the “let’s replace aws” narrative and the "nodes on the corporate cloud are dangerous " which no one seemed to care about despite it being a serious issue, they have shipped and shipped and shipped. Http_outcalls, t-ecdsa, improved key sharing, and on and on.

We find ourselves at a point where the Internet Computer has an immense amount of superior tech ready to deployed to a web3 world. One only needs to spend a few minutes wandering around the floor of EthDenver to realize how many entire enterprises and L2s have been launched to attempt to solve one of the 50 problems that the IC has already solved to a threshold of sufficient security and comparatively unreal performance. If we are being memetic, why aren’t we leaning into these memes? The latest meme seems to be AI. One day the IC may provide the most secure and trustworthy AI solution one day and everyone is excited about it, but in the next 5 years doubt anyone is going to out AI openAI, google, Amazon, Tesla, and Apple. We have a long way to go for public proof on that meme.

Right now we have built the best btc bridge brige with sufficient security. We have most of the best ETH L2 in place with sufficient security. We can build the best ao mu, su, cu pipeline with sufficient security despite testnet. Let’s do that and launch those memes. Seven-day optimistic withdrawal? Here is a 1-minute solution. $100 gas for a zk roll-up? Here is a reverse gas model that costs a couple of cents. You have a data availability problem? Here is a compute + data solution that solves your problem with self-funding of eternal data storage. Need a crypto secure scheduler? Ours just works up to 500tx/second per subnet.

Yes, sufficient security isn’t sufficient in the long run and we need better decentralization. We need an illuminated pathway to get there(the market seems to be fairly forgiving with third as many of the L2 operate on laughable multisig scheme). It is likely time to say "look…due to VC, funding, and market pressure we had to ship before the vision was complete…we aren’t finished yet … here is the pathway and as soon as we’re finished with the tech we’ll activate it but we are going to build this tech and if you need beyond sufficient decentralization before that, come and take it(an invitation, not a threat)… If “they” can’t say this for some regulatory or securities reason then we need to scream it from the rooftops for them. At some point the rubber will meet the road and the it will either happen gracefully, or it won’t. Yes, for now there will be a segment that will demand pure permissionless from the jump and this is going to be a non-starter for them. They have other avenues.

What if when the rubber meets the road and we need to move beyond sufficient security and the foundation doesn’t go where you want to go? Say in 2030 the new King of Switzerland and Lesser Europe is holding a literal gun to Dom’s head and demanding the figurative keys to the IC be handed over to a {insert your second least favorite form of government} government. (Exaggerated scenario because DFINITY had shown a remarkable amount of good faith to this point even if they’ve had to navigate some regulatory, political, and memetic hoops with a bit less grace than would have been preferred) What do we do? How many people is this scenario keeping from leveraging the insane tech the IC offers today?

The answer likely lies in architecture and proactive game theory planning. External t-ecdsa based UTOPIAs enabling social recovery and forkability? Contracts that regularly checkpoint and publicly stash state in forkable schemes? Co-execution between ao and the IC? All kinds of options are in the table, but I hate to disappoint the community…this is 100% on us. DFINTY is finishing the base layer. In the meantime, we’re underfunded, unorganized, in over our heads, and completely reliant on ourselves to figure it out. Easy-peasy. LFG.

I’m going to keep encouraging Jordan to push on the IC for as much performance as we can squeeze out of physics. I’m going to keep focusing on actor-based infrastructure to drive a web3+ world that catapults humanity forward. We all should look to better understand the decisions and priorities DFINITY is pushing as they’ve done a damn good job on the platform so far. We all will only move beyond the web2 world together if we marshal 10x value platforms for the users. If ao can help, let’s use it. No one is better positioned to accelerate what they are trying to accomplish because we already have 70% of it, and it’s working. In prod.

#TICADT - The Internet Computer Already Does That

Edit: fixed an image

Topic		Replies	Views
Scalability of update calls in a common scenario Developers	21	4029	October 30, 2020
Consensus and inter-canister calls Developers	12	3202	October 29, 2020
ICP.Lab Storage & Scalability Summaries Developers	18	4769	April 9, 2025
High User Traffic Incident Retrospective - Thursday September 2, 2021 Developers	50	8973	October 30, 2021
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	180	10338	October 16, 2025

Let's solve these crucial protocol weaknesses

Related topics