Technical Working Group: Scalability & Performance

Hey @abk @dsarlis, and others in this working group. To put forward a topic for next month’s working group session, I’d like to suggest subnet scalability learnings from the past year, as well as a focus on the past month or so of increased compute activity.

Ideally, we the session would bring data on how different subnets are handling load, how that impacts subnet limits, and then all attendees can apply this data to a discussion around how teams can best scale different types of canister architectures, and what order of magnitude scalability increases are reasonable to expect across immediate to 1-year timelines.

5 Likes

Recording can be found here

Great idea! I think we’ll bump the meeting to next week and then we can cover this.

2 Likes

Hi scalability team,

I am really happy with the fact that the 0.5billion parameter LLM is running stable now in a 32bit canister, and it is giving very good answers.

You can try it out at https://icgpt.icpp.world/

You will notice right away that the token generation is slow, and the main reason for that is the instructions limit on update calls.

This LLM can generate 10 tokens within the instructions limit.

The way we work around it is by doing a sequence of update calls, and using the prompt caching that is a standard feature of the llama.cpp LLM. We store the prompt cache for every conversation of every user identified by their principlal ID.

This works very well, but a speedup would be very welcome.

Would it be possible to further increase the instruction limit?

Doubling it would literally double the token generation speed.

3 Likes

Scaling Canisters for Large User Base: Need Advice

We are developing a group wallet solution on ICP and are considering a one-canister-per-user (and per user group) setup. However, we haven’t fully implemented this yet, and recent discussions have made us unsure if this is the best approach for scalability and efficiency.

Key objectives:

  • Design a backend architecture for more than 50k users.
  • Ensure low latency (<2s) for critical operations like crypto transactions.
  • Allow acceptable latency (6-8s) for non-critical operations like system settings.

Specific questions:

  1. Can we stick with one canister per user/group?
  2. How do subnets handle horizontal scaling for thousands of canisters?
  3. What are the best practices for load balancing across subnets?
  4. What strategies are recommended for managing thousands of canisters?
  5. How can we optimize cross-subnet communication?
  6. How can the European subnets ensure GDPR compliance?
  7. Are subnets regionalized, and are there plans for more regional/national subnets?
  8. Should we assign canisters to a particular subnet for better performance?

Any insights, even partial, would be greatly appreciated!

1 Like

Have you consider asynchronous workflows that make awaited self calls back to the same canister. That way you can bypass instruction limits every time you make a new inter-canister call back to your same canister. As a trade off, it does add some additional latency associated with the inter-canister call, however.

1 Like

Thank you for this first usefull hint, … We will study the possibilites related to “Asynchronous messaging” and isolated canister state that can result in a “loose coupling” between different canisters and subnets.

Link: https://internetcomputer.org/how-it-works/architecture-of-the-internet-computer/#asynchronous-messaging

A few things that might help with the consideration. This is just my personal feedback (I do not work at DFINITY):

Curious, what language are you using/considering for the wallet?

What types of data and how much of it do you expect to store per user? You can do this all in a single canister (I recommend to start out with just a single canister). IIRC Internet Identity has over a million identities, and that’s all in a single canister.

You can do ~2-4 second transactions if you go directly from the FE to a ledger canister (not through a backend canister. If you perform ICP ledger transactions through a backend canister it’s 2-4 seconds (call to your canister) + ~6 seconds (call to ledger canister on a different, fiduciary subnet).

There are limits with respect to how many canisters a subnet can handle, but this also has to do with the activity & compute in each of those canisters. A canister with 10-20k canisters that perform a lot of compute puts much more load than a subnet with 50-90k of canisters with lighter/load & compute.

Index/Factory with child canisters that they spin up and control. You can also look at OpenChat’s code for how they do this with multiple levels of index canisters, that allow them to manage scale massive amounts of canisters cross-subnet.

Do less of it if possible, or parallelize your inter-canister calls as much as possible (don’t send in sequence). Don’t send too much data per inter-canister call. I believe there’s a 2-4MB per block limit on cross-subnet ingress per subnet?

1 Like

I don’t understand this. Is there no instruction limit when doing a self call?

Programming Languages

Thank you, icme.

We use the following languages for our wallet:

  • JavaScript: 90.5% (React JS)
  • Motoko: 9.1%
  • Other: 0.4%

Data Storage Requirements

For processing payment approvals, attaching documents is an option. Such attachments could include photos, scanned documents, voice messages, or simple PDF files. Groups also have a group chat feature, where attachments can be exchanged as well. A group might represent an average company with dozens of office workers conducting a dozen transactions every day. Assuming an average file size of 2MB, and allocating 50GB for their repository, storage should last for about five years. To ensure at least three years of storage even for groups with heavy file usage, we plan to reserve the full canister capacity of 200GB for each user group. Accordingly, it makes sense to grant either each user or each user group its own canister.

Computing Requirements

The application doesn’t perform any operations that are very demanding in terms of computing power since there is no video streaming or 3D rendering. However, we do need to maintain low latency to ensure a good user experience. For example, when setting up a transaction that requires approval, fetching the spending power and available balance should be fast. Similarly, simple tasks like updating a profile shouldn’t take too long. Although calling data from a ledger canister on a different, fiduciary subnet takes about six seconds, we can potentially preload some data in the front-end to minimize the perceived waiting time while awaiting final confirmation.

Parent-Child Canisters

Yes, we plan to use an Index/Factory setup, at least as I understand it. One canister will store the user-user group matrix along with access levels, and other canisters will serve as parents for users and user groups. We will look at OpenChat’s code for how they manage multiple levels of index canisters, as this indeed sounds like a very promising approach.

Inter-Canister Calls

Here is where things start to get tricky. The front-end needs to look up the index to find the ledger canister, which may take up to six seconds if the front-end, index, and ledger canisters are not on the same subnet. I admit I still don’t fully understand the relationship between canisters and subnets, but I will now further dive into this topic. I will also explore the options for asynchronous requests and consult people with more experience than myself.

Thank you again for the great support.

1 Like

Every time you make an inter-canister call back to an API on your canister, it makes a state commit point in your canister and resets the instruction limit (at least it does this in Motoko, pretty sure this works for any canister).

Pretty sure instruction limits only apply for synchronous processing on your canister, and resets once you make an asynchronous request.

See this thread

1 Like

Are these Monthly Active Users (MAU) ? If so then you will get like 50 users (every 30 min) and that may be around 1 transaction per second. One canister can easily handle that. During peak traffic, you may need more. The current limitation canister<->ledger is 50tx/s, but with the upcoming icrc4 it will be a lot more.

When it comes to latency - your canister - if tokens are in its custody - it is the authority, so 2s latency isn’t unachievable to get sent confirmation.

1 Like

Each subnet can only hold ~750GB of state currently, and there’s around 15-20 application subnets that you can deploy to. So you might need to rethink your storage requirements. Some apps that have big data like Yral store their videos off-chain in order to store up to millions of short form videos.

Until we get storage subnets, I’d recommend rethinking storing massive amounts of file data, not just because of GB limits, but so you can cover your hosting costs of storing that data on-chain.

1 Like

When you say " it makes a state commit point in your canister and resets the instruction limit", isn’t that the same as what happens when I call multiple times from the frontend?

It is going through consensus in both cases.

In a way, doing multiple calls from the frontend leads to a better user experience, because tokens are send back each time and can be displayed to the user, while the frontend is making the next async call to the backend LLM canister

Potentially. But this requires the frontend be part of the coordination and constantly engaged. If they disconnect or leave the page then the response is lost (less reliable).

If you wanted to kick off an async job with a user request (return a requestId to the user quickly while crunching the work on the backend) then you could perform async AI inference and use self-calls to get around the instruction limit with DTS, meaning your model computations could go as long as they need.

In certain cases people might want to kick off a few requests and just have the canister crunch in the background. I’m sort thinking of the ChatGPT o1 preview experience where the model takes a bit longer to return but the output is much better than 4o.

1 Like

Great points. It is indeed less reliable right now with the frontend as orchestrator, and only one conversation at a time …

Thank you, this is very informative i have couple of more questions

Is it possible to manage data across multiple subnets within a single application on the Internet Computer (ICP)?

If so, what strategies can be employed to optimize cross-subnet communication for large-scale applications?

Additionally, are there any upcoming improvements or innovations planned in the realm of subnet scaling and performance optimization on the ICP platform?

Thank you for these helpful answers. I look forward to get such storage subnets. If scaling and cross-subnet communication for large-scale applications lead to very high latency and costs, and yet cannot be optimized, we will need to consider interim solutions.

It would be possible to get the best of both solutions by having the frontend do a single update call, which the canister itself continues via self-calls while the frontend periodically polls via queries to get the latest state.

4 Likes

Note that the instruction limits aren’t artificial - they are determined by how much we can execute within the time available to hit the target block rate. So if we significantly increase the limits you would see more tokens generated within each round, but rounds would also be taking longer.

1 Like