Recently, the return rate of canister has decreased or slowed down

I don’t know if it’s because of my canister or the subnet. Recently, many update requests have not been returned.

canister: ojpsk-siaaa-aaaam-adtea-cai
Below is the application screenshot. The yellow ellipsis icon is the update request that has not been returned.

Currently, we have 3-6K new users registering every day, but the system success rate is relatively low, which greatly affects business development.
Ask:
How should we observe the current throughput and load ratio of canister?

What should we do if the load is too high?

We know how to deal with traditional servers, but we are very vague about IC operation and maintenance. Hope to get help!

Thank you!

How should we observe the current throughput and load ratio of canister?

I don’t think there’s a good way offered by the protocol at the moment for update calls. You would need to add some metrics in your canister to see how many requests you’re getting.

What should we do if the load is too high?

You can consider some form of rate limiting so that you don’t get too many requests that time out before they are processed (that’s what it seems like based on the screenshot you shared). Additionally, if you have too many users, you should consider how to scale your application. I’m guessing you’re using a single canister currently? Perhaps you can design things in a way that you use multiple canisters to spread the load a bit better. Some pointers on potential architectural solutions are given here.

Hello, thank you for your reply.

Question 1: If IC does not provide a monitoring method, we can only ignore it, because this will increase the system burden. If the problem is clear, there should be no need to monitor.

Question 2: We currently use a multi-tank architecture, but for example, user registration and verification, there must be a unified entrance to ensure global consistency.
This is our architecture diagram

In this case, I hope to get your effective support!

If you don’t have a good solution, we give 2 suggestions and ideas.

  1. Open a single-node tank on ICP (I know it is generally 13 nodes now), which is used to process hot data and high load. Finally, program the processing results to the target persistent tank as needed. Because the main reason for not being able to handle it is to wait for 13 nodes to synchronize consensus. It’s just a way of hope. You only need to open this type of tank and cancel the simulated delay update (I think it should be a simulated delay in the local development environment)

  2. We find a way to build a load gateway outside the tank, and then submit data to IC in batches one by one as needed. For example, if we submit once every second, no matter how many users register, only 60 update calls will be executed per minute. But this is funny. ICP’s on-chain computing has become a gimmick, and it has become lightweight computing plus storage.

If you can’t take the first suggestion or change the method, we can only solve it by ourselves with the second method.

I hope to get your attention and effective reply, thank you!

Because the number of registrations of several thousand people a day is a very small problem for an ordinary server, a docker container of 10USD a month can handle such a load.

Hi Haida

Option 1 is not safe, so it is not going to happen. You should consider something like your option 2 while taking into account the architecture advice that dsarlis shared.

If you go with an off-chain gateway, you can balance, queue and if necessary limit the request rate. But there is no need to go as low as one request per second, the IC can hande much more than that. If you use the IC’s computation model correctly, it is not a gimmick. But I can see how it is difficult to come up with an architecture that is suitable for your use case and growth expectations. I think it is actually fortunate that your architecture is the bottleneck, not the system, because that means you can probably fix the problem!

You could also go with additional entry-point canisters that reside on different subnets. That is more difficult to build, but ultimately scaling via more subnets is the only sustainable way of scaling on the IC. Furthermore, this would be much nicer because users would not have to trust the offchain gateway from the other option.

Could you share some more details of your canister fleet?

  • Do you deploy all of your canisters to the same subnet?
  • How many canisters are there?
  • Do the number of canisters scale with the number of users?
  • How many human users do you have and how are they expected to use the service?
2 Likes
  • Do you deploy all of your canisters to the same subnet?
    Yes, they are all in one subnet at the moment. We manually deploy a main canister and the other subcanisters are automatically generated.

  • How many canisters are there?
    There are about 19 canisters at the moment, but only 1-3 are hot.

  • Do the number of canisters scale with the number of users?
    Yes, account and ledger containers are automatically created horizontally.

  • How many human users do you have and how are they expected to use the service?
    Currently close to 100,000 people use it through the app, and the plan is to meet the needs of more than 1 billion users.
    Currently, the IC processing capacity cannot keep up.

I hope you can help me, through real-time communication tools, or arrange Chinese support.

Because user registration must use a unified container, because it is necessary to cache verification codes and maintain the uniqueness of the ID of multiple account containers, if new users are registered in different canisters, it seems more difficult to coordinate these relationships, which may make the container busier. Because registration is a new user, it is impossible to create a new canister when the current canister storage capacity has not reached the threshold, so opening more canisters cannot solve this problem.

1 Like

Thanks for the details.

Yes, account and ledger containers are automatically created horizontally.

But not directly, because you only have 19 canisters, right? In what way are they “created horizontally”? If three hot canisters have to handle the load of tens of thousands of users, that can be a significant bottleneck (depending on the access patterns and request types).

Currently, the IC processing capacity cannot keep up.

The subnet is not under significant load, so rest assured it is an architectural problem, not a platform problem. If you can distribute your load to a bigger number of canisters somehow, you’ll see much more performance.

Unfortunately I cannot debug your access patterns and architecture from here. Perhaps there are community members who can, however. E.g., @bobdotfun has experience with scaling, perhaps they have advice?

1 Like

opening more canisters cannot solve this problem

If this is true, then you also have the option to share your repository and let the community micro-optimize your code. But I believe you’d have more success rethinking your approach. Sorry I cannot offer more concrete advice in that direction.

1 Like

I think my first suggestion is feasible.
You didn’t understand what I meant.
Because we only need to use this single-node canister to solve the high-temperature temporary data and high-load problems in the registration process. The main function is calculation and load. There is no need to over-consider data persistence and security issues.
Not only us, I think there is such a demand in many scenarios

Actually, a long time ago,

Before I encountered this trouble,

I have the idea for this proposal.

There are a number of uuid type libraries that will give you random ids. They can by sync’s to a central id canister in batches from different subnets if you are getting that many sign ups per second.(but each round should be able to support a significant number of signups even on one canister.)

3 Likes

Ulid library I made, also with a generator code. It does fixture ULIDs as well if needed.

6 Likes