Building planet scale apps on ICP

I’ve been following the web3 space for a few years, but only started looking more seriously into it the last month. My interest has always come from the vision of actually running the web in decentralized fashion, with micropayments replacing primarily ad-driven monetization models and protocols owned by their users. So I’m glad I’ve come across ICP, which so far seems the most realistic approach to a general purpose execution layer for the web.

I’ve been thinking a bit about how ICP could actually be the foundation for building scalable systems with billions of users someday. Figured I should share my ideas here to get feedback and maybe learn about other projects and ideas that I haven’t come accross yet. It ended up a bit of a long post, but hope it makes for an interesting discussion :slight_smile:

Overview

Juno seems like a great piece of tech that’s very much going in the direction I imagine, so I’ll use that as a reference point in this post. While Juno doesn’t seem to have any auto scaling beyond a single cansiter yet, it does have the basic ingredients you need in any scalable app:

  • Datastore

  • Stateless functions

  • Storage

  • Authentication

Notably, you want to avoid stateful backends, because anything that has state is incredibly hard to scale. Instead, we’ll solve those problems once in the data store and only use stateless functions that can easily scale horizontally. In the following, I’ll go through all components and describe ideas for making them scalable beyond a single canister.

Datastore

I think Firebase’s NoSQL data store is a good starting point. Let’s represent our data in collections that map a UUID to an arbitrary document (BSON seems like a reasonable data format to me). We also support subcollections, so you might have a document like /posts/[postId]/comments/[commentId].

Scaling

So far it’s similiar to Juno’s datastore (plus subcollections, which I don’t think are too hard to implement). But what if we wanted to store terrabytes of data in this datastore or serve millions of queries per second?

The easiest approach seems to allow collections or subcollections to be sharded across different canisters. For example, if we wanted to shard the /posts collection over 8 instances, the root datastore node would just store the canister IDs of the 8 instances, rather than the data of the posts collection. Assuming random UUIDs, looking at the last three bits will give you reasonable load balancing and tell you which node is responsible.

What if your collection of posts is growing too big for 8 canisters? You could start by replicating the data to 16 new canisters in the background. While the initial replication is in progress, you ensure all writes are replicated on the new cansiters as well, and once that’s done you can mark the 16 new canisters the source of truth. In order to avoid every operation having to go through the root node, you also want to have some TTL that guarantees a node is authorative for a subset of documents until a specified time.

You could also use above mechanism to create read replicas, although that might not be needed if you’re using query calls that don’t need consensus anyways, as there would be easier ways to scale those. (On a more general note, could you make ICP nodes sign query calls and stake funds to guarantee correctness? If nodes kept history of state for a brief period of time, they could then check responses randomly. I guess it’s too expensive to do that in general, but it could certainly work for this data store as long as you keep old values for a few seconds.)

Embedded Functions

You do want the ability to embed small pieces of code directly on your datastore canister for tasks that should be executed synchronously and atomically:

  • Authorization rules

  • Triggers

  • Stored procedures

Juno currently supports Rust and JavaScript compiled into the canister for this. I think allowing you to change these at runtime would be important for a great developer experience. You wouldn’t want to restart Postgres just to change a trigger, would you? I guess Juno could do that as it’s already interpreting the JavaScript anyways, although I would imagine CEL or a whole wasmi interpreter (supporting AssemblyScript) are good options as well.

Offline-first support

Since the datastore is decentralized, writes will always be slower than in a centralized system. That gives you even more reasons to follow an offline-first paradigm where clients keep a local copy of a subcollection that they read from and write to and where the sync happens in the background.

I created a prototype of a data store in Motoko that supports this sync out of the box by keeping a revision counter for each document. That is also useful for avoiding conflicts when multiple clients attempt to write to the same document in parallel (otherwise the last write to server always wins, even if the client intended to update a much older version of the document). (Note I hadn’t come across Juno when I wrote that prototype, maybe I could have built it on top of that…)

Token integration

To me the appeal of web3 isn’t just decentrelization, but also the ability to easily integrate micropayments and token transfers to build prototcols and apps in which incentives are aligned better than in the web2 world (where you use stuff for free, and in return your data is sold and you have to see ads). Therefore, a web3 data store should also have native support for that!

I guess you could just track balances in the data store and then use triggers to implement things like “inserting a new document in collection X costs Y tokens”. But I’m sure there are ways to make this super easy in the data store, e.g. integrations with the ICRC-1 standard. Just need to be mindful that if you call into other canisters during a write operation, you’ll probably want a lock on the document level to handle competing writes gracefully.

Serverless Functions

A huge class of apps can already be built with the above data store alone. The serverless functions I’m talking about here are comparable to Cloud Functions (Firebase) or Edge Functions (Supabase). It’s essentially just a higher level abstraction that makes it easy to run scalable code on top of ICP, so you don’t have to worry about which cansiter your code is running on.

I imagine you could have an executor pool, which is an auto scaling pool of cansiters that executes jobs. A job would be specified through the hash of the wasm code file (which the executor can fetch from storage if needed), a calldata blob, as well as an optional authentication context (e.g. executing on behalf of a specific user or with specific permission). Jobs could be triggered in a meriad of ways, e.g. through direct invocation by a client, triggers from the data store, calls from a different cansiter, recursive function calls, through timers, or even through events on a different block chain that we listen to with chain fusion.

I’m imagining to just run this code sandboxed in wasmi, but if you run the same function often with the same authentication context, you could deploy it as a native canister automatically to speed things up (authentication is a bit tricky assuming the code is untrusted, but should be possible). In the interpreted version you could also overcome ICP’s 40 billion instrution limit by just pausing exeuction briefly.

By default, costs for the execution (not just cycles, could also include costs for transactions on different chains) should be paid by the caller sending gas. But similiar to the embedded functions in the data store, you could also have embedded functions in the executor pool that check whether the execution may be billed to a different account (based on what code is executed by whom).

It would also be great to have an integration with zero knowledge pools and TEE pools to offload computations where possible. Although those won’t allow you to do things like HTTP outcalls or signing transactions.

Storage

I’m surprised that ICP hasn’t embraced IPFS more. IMO it solves caching and verification once and for all by having immutable objects that the client can verify themselves. I think there’s a good chance major browsers will add IPFS support at some point in the form of dnslink (maybe specifying a preferred trustless gateway in a DNS record as well), as it’s quite easy to implement and does increase security.

Of course, actually getting data onto IPFS from a browser without a central party has been a pain so far. Theoretically you can run a local node in the browser, connect to IPFS with WebSockets and then use filecoin to pay for pinning, but I haven’t been able to get that to work.

However, ICP makes storing data without a central party trivial. The IPFS standard already splits files into 1MB blocks that you could easily upload into a canister and calculate the hash. Scaling storage shouldn’t be too hard, just spawn more canisters when needed. The trickiest bit seems how to find the canister that stores a specific block in a scalable way, but you could just reuse the data store described above to map block hash to canister ID.

You could also build IPFS Nodes and Gateways that communicate directly with ICP API boundary nodes to retrieve blocks / files from ICP again. At that point you should also be able to use existing IPFS Gateways to access files or transfer the files to cheaper storage protocols like Filecoin, depending on what your storage needs are. Wondering if this has been considered before?

Authentication

I haven’t looked too much into Internet Identity, but seems like a good starting point. If I were to build a developer platform, I’d probably follow Supabase’s model to allow other authentication providers besides Internet Identity, e.g. anonymous logins, using passkeys, sign in with web3 etc. directly. That’d certainly be useful for apps that should work well cross-chain. I think OAuth-like flows would also be possible, e.g. allow user X to modify a subset of the data store on behalf of user Y.

For the above infrastructure to work, you’d also need authentication of canisters, e.g. to verify that a canister is actually part of your trusted exeuctor pool. I imagine this could be solved with a central cansiter in your “canister cloud” that signs certificates stating “canister with XYZ is trusted”.

Summary

Due to ICPs subnet architecture and ability to spawn new cansiters programmatically, ICP seems the first blockchain that allows you to build applications that could automatically scale to more transactions than a single machine could handle. I’m sure the devil is in the detail, but it seems to me the fundamental building blocks are already in place. Or am I missing any crucial bottlenecks?

I should note that I’m in no way claiming horizontal scaling to be the current bottleneck of web3 adoption. Currently the bottleneck seems more to be convincing user experiences that actually add value over web2 experiences. But it’s still important to think ahead and I think some of the abstractions I outlined above would also make for a better developer experience that would allow web2 devs to easily build decentralized apps without learning Solidity, Motoko, Rust or similar.

6 Likes

I agree this is the problem that no one has seemed to solve.

Those of us who choose to use this web3 tech are doing so from some sense of idealism, or personal principles we hold. But to convince an average user to use it, it needs to be practical in some meaningful way to them. This is tricky.

Or atleast, no one has convinced enough people they solved it!

1 Like

As someone playing with the tech, the real challenge as a web2 dev is the lack of ressources to learn.

There is almost nothing to learn how to architecture the application to handle scaling:

  • how to split update call efficiently
  • how to organize indexes/buckets
  • how to handle cycle topping for dynamically created buckets
  • how to split the charge between the frontend (with multiple requests) and the backend (with composite queries)

And more concrete problematics like handling filtering without a centralize database.

Moreover, if you want to scale you need to think about this before starting to code, or the refactoring will be extremely complicated.

I think a concrete official training ground could go a long way to boost adoption (like a tutorial for an infinitely scalable online shop and an infinie scalable social media app).

4 Likes

It would take only one transposition to make ICP an acronym for Inter Planetary Computer.

I like the idea of the Inter Planetary Filesystem but performance-wise I wasn’t happy with it when I was using it. If that could be improved, this could be a marriage made on chain! :wink:

We are actually trialling a blob storage service to hold images and videos served directly to front-ends. It’s not a mature service, we’re checking what really enables builders before spending a lot of time making a beautiful solution.

In the past we’ve made more mistakes building something for millions of users before doing very rapid iteration to find product market fit. If you make an app that gets a few hundreds of thousands of users, I’d say that it’s very likely that you’ll get the resources to make it scale.

2 Likes

Multiple login methods do work. With Internet Identity you can log in in a variety of ways, including e.g. gmail, and you can also use e.g. Plug Wallet.

I did actually look into OAuth authentication some years ago. It’s so obviously attractive. But the crypto there relied on symmetric crypt, so it’s not really suitable for not just ICP but any really robust blockchain. But other standardized ways of logging in should be supported, IMHO. I should add the caveat that this was some years ago, and OAuth might have more modes now and I might not be remembering all the details perfectly.

If you have specific issues, please do reach out. Time allowing, I’d be happy to go through your architecture with you and we can feed back the learnings into the training documentation.

1 Like

We are looking into how to accept micropayments as well. Attaching cycles works fine when a business just wants to receive payment for the cycles spent, but if you want to make a profit and charge payments in cryptocurrency, you need a bit more. Having spoken with several groups that need to charge, I reached the conclusion that the payment API needs to be consistent for customers but quite configurable for businesses. I made a library called PAPI that deals with high value payments in any ICRC-2 token, such as ckUSDC. It works well at the moment but it clearly needs work to reduce overhead for micropayments. Completely doable, but needs some engineering time.

3 Likes

Concerning adoption for web2 developers, the main problem is that it’s trivial for a single canister architecture, but when one want to learn how to do the next facebook, it’s a whole new world.

The actor model is totally alien for the majority of web 2 dev. I have 10 YEO in pure web dev (with some projects with complicated infra and hundred of millions of request/day) and I never encountered it in real life, and with the way stable memory works it quickly became difficult to know what is the best way to architecture it for the problem at hand. The documentation can give cues on how to handle it, but there is no demonstration from start to finish, with an highlighting of the different caveats developers can meet. I play with a pet project at night for fun because I love the tech, but I had to rewrite my pet project several time to finally understand how to scale.

Web 2 is full of people coming from bootcamp (myself included), an when they will search how to do something, they will have the choice between a tech with hundreds of complete exemples, or doing something on icp, with the only exemples being either complete projects on github (unreadable when you just start) or read everything in the official documentation and pray to assemble the pieces of data in the right order.

I think concrete exemples of complete projects (multi canister, multi indexes, dynamically created buckets, maintenance canister), with step by step guide could go a long way to attract web2 developers, especially if it’s done with a familiar tech stack (js/python)

For my specific problem, I created a thread some days ago without any response (it’s my only missing piece of the puzzle):
How to handle filtering/pagination at scale - Developers - Internet Computer Developer Forum

4 Likes

Thanks for starting the thread, @marceljuenemann!

I don’t have much technical input to add here, a lot of great points have already been shared! But since you took Juno as a reference point and it’s my project, just wanted to quickly share the following.

From day one, my vision has always been to let developers someday scale beyond the all-in-one container model. You can find old announcement posts and videos where I mention this, but to summarize:

We start with satellites, and over time Juno will hopefully enable developers to build space stations :satellite::ringed_planet::sparkles:.

5 Likes

Welcome :folded_hands:

Great thoughts and observations. I too feel like ICP is closest and seems like the only complete full-stack solution. From the earliest days since launch, I viewed ICP as a crypto cloud. And lately like to call ICP an Internet Cloud Protocol (next Internet protocol - planet scale), solving fundamental common cloud problems at the protocol level. There is nothing like this.

And now with Caffeine, give it a year to mature, and it can be used to build public services, small business websites, school projects etc - all on-chain. It removes the bottleneck for 99% of services. For the complex 1%, you need hardcore technical people, like you.