Canister Output Message Queue Limits, and IC Management Canister Throttling Limits

icme · October 19, 2022, 12:15am

This is a two-part question on 1) Canister Output Message Queue Limits, and 2) IC Management Canister Throttling Limits

First, some context.

I recently ran into the “Canister trapped explicitly: could not perform self call” issue when trying to asynchronously and concurrently query the canister status of over 500 canisters at a single time.

I understand this is due to limitations of the the canister output queue.

1)

What is the canister output queue, where are its limits/parameters defined in the code, and is there any way I can push this limit or navigate around these limits (if I need to constantly contact potentially thousands of canisters from a single canister)? Is one shot execution preferred?

2)

Additionally, given that from a canister on the IC (not an ingress HTTP call) I am calling the IC Management Canister’s canister_status method for thousands of canisters, I am assuming there a point at which the management canister will throttle my canister’s requests? If so, what is this throttle limit? How does the IC Management “Canister” manage load?

Finally, a “nice to have” feature would be if instead of making many individual requests to the canister_status method of the Management Canister, a canister could just make one requests to the Management Canister with a list of canister ids in order to then have it then retrieve and return a list of each canister id’s status.

paulyoung · October 19, 2022, 4:34am

For #1, a quick search revealed this:

github.com

dfinity/ic/blob/a52bb83dd6fc99072db070ccc905478f5ce4bee4/rs/replicated_state/src/canister_state/queues.rs#L30-L30


      
          pub const DEFAULT_QUEUE_CAPACITY: usize = 500;

I also spotted this, which seems like it could further limit the number of messages:

github.com

dfinity/ic/blob/a52bb83dd6fc99072db070ccc905478f5ce4bee4/rs/replicated_state/src/canister_state/queues.rs#L1325-L1330


      
          /// Returns the memory required to push `req` onto an input or output queue.
          /// This is the maximum of `MAX_RESPONSE_COUNT_BYTES` (to be reserved for a
          /// response) and `req.count_bytes()` (if larger).
          pub fn memory_required_to_push_request(req: &Request) -> usize {
              req.count_bytes().max(MAX_RESPONSE_COUNT_BYTES)
          }

roman-kashitsyn · October 19, 2022, 7:43am

Is that the exact error message that you get? If so, it tells you that the canister sent too many messages to itself.

The system keeps track of queues between pairs of canisters. When your canister sends messages to thousands of canisters, the system creates a separate queue for each destination. Each canister also has a queue for messages it sends to itself.

The DEFAULT_QUEUE_CAPACITY constant that @paulyoung dug up applies to a single queue.

If you see could not perform self call, your canister might perform nested self calls exhausting the message queue limit. You might be able to fix the issue by restructuring the code. Is the code public?

Your canister can interact with as many canisters as it wants simultaneously. There is no limit on the number of queues, only the size of each queue and the total memory consumption.

dsarlis · October 19, 2022, 8:02am

Roman’s answer is accurate, I wanted to provide some extra clarification for the management canister question.

Based on the general limit on the queue to a canister as already mentioned by the others, you can have up to DEFAULT_QUEUE_CAPACITY messages from a single canister to the management canister. So, if you want to get the status of thousands of canisters in the same message execution, I expect that you’ll hit that limit. But as long as you space them out you should be fine.

Finally, a “nice to have” feature would be if instead of making many individual requests to the canister_status method of the Management Canister, a canister could just make one requests to the Management Canister with a list of canister ids in order to then have it then retrieve and return a list of each canister id’s status.

Nice idea but it might be harder to implement than you might initially think if we want to allow for an arbitrarily long list of ids. A response is limited to 2MiB so if we need to return thousands of canister statuses in a single one we might be close to hitting that limit and then we’d need to support some form of pagination for the response (arguably you can still fit a few thousand canister statuses before you actually hit the response limit but it needs to be taken into account if we want a general solution).

bogwar · October 19, 2022, 9:41am

Further to @dsarlis 's point regarding canister_status: these calls are passed to the subnetwork hosting the canister so batching canister_status would not work w/o a lot of abstraction breaking and low level machinery and bookkeeping (i.e. parsing the message, constructing messages for the different subnetworks and then reconstructing the reply from the replies we would get)

cryptoschindler · October 19, 2022, 6:17pm

Maybe this is a good time to bring up the thread mentioned above again. I’m wondering where in this repl there is a self call

I’m also not able to grasp how the platform behaves in the case of the repl if we imagine that I awaited the calls instead of ignoring them.

To my understanding ignoring them is like fire and forget. I fill up the queue until the canister traps with the “could not perform self call” error, and all the messages that have been in the queue until this point still get executed but I don’t know which ones those were because everything gets rolled back after the trap. Is this correct?

If I await them, the queue can’t fill up because I only issue a new call after the old one returned. What I don’t understand here is how this works under the hood. Imagine I have 100 inter canister calls each of them taking more than 2 seconds to return. Does that mean the IC can’t process new blocks/messages for the next 200+ seconds (let’s assume the cycles limit isn’t reached for the message)? I thought spanning multiple rounds of execution is something that DTS would enable

I’m probably mixing some things up here and there but I hope it’s clear enough

icme · October 20, 2022, 5:43am

I am not sending calls to myself. I’m executing code similar to that mentioned in this post, where I send off a bunch of async calls in parallel, and then after all the async calls are in flight, I collect the awaited results.

In my specific case, all of these asynchronous calls are hitting the IC Management canister, so then according to your description there is a single queue between my canister and the IC managment canister and that queue is filling up (hitting the 500 limit).

I’m currently able to get around the 500 limit by batching the calls (batches of ~300 or so) at a time, but wonder what might happen if I now want to do this same batching but now with a second, third, or nth canister, where canister A is batching calls to both canister B, C … all the way to canister N? At some point, would the message queue to canister A hit a cap due to all of the incoming asynchronous messages from canister B to N?

icme · October 20, 2022, 6:59am

Yep, this essentially a much simpler example of the same issue I was hitting. Maybe the error message can be updated to something like “Canister output queue limit exceeded”?

icme · October 20, 2022, 7:05am

This makes sense, thanks @dsarlis and @bogwar for the additional context and explanation!

Instead of calls coming from a single canister, what if I had 500 canisters that are all batching calls (300 at a time) to the IC Management canister. What if this number of canisters making batch calls was raised to 10,000 canisters?

I guess what I’m trying to get at is, does the IC Management canister have any load limitations? I’ve heard that technically the IC Management canister is not a “canister”, so I’m curious about how it balances or queues up load.

dsarlis · October 20, 2022, 7:48am

@icme It’s not that different than if you wanted to hit some other canister with this load. The queues are between pairs of canisters as already mentioned earlier in the thread. This means that if you have N canisters trying to hit the management canister then you’ll have N queues on the management canister each to hold the incoming messages from each of the N canisters. We do not have a limit on N, but as we’ve said each queue has a default capacity of 500 messages. The next limit you might hit then is the subnet message memory capacity.

Now, there are some more technicalities if you want to go deeper (I’m unclear how much of this is theoretical or you have actual use cases in mind). E.g. if your target canisters are on different subnets, then hitting the management canister means that eventually the messages are routed to each subnet hosting the target canister, so you get some more capacity because of that (basically you take up the queue for the management canister on different subnets). Also, if you are doing install_code messages, we apply an extra rate limit on them if they’ve consumed too many instructions.

cryptoschindler · October 21, 2022, 3:19pm

@roman-kashitsyn

with the REPL I linked here I can’t even get close to 500 messages. Can you explain why that is (and maybe answer my other questions )?

cryptoschindler · October 21, 2022, 3:36pm

And when running this REPL with 200 as an argument, the following error appears.

Server returned an error:
Code: 400 ()
Body: Specified ingress_expiry not within expected range:
Minimum allowed expiry: 2022-10-21 15:32:10.019462710 UTC
Maximum allowed expiry: 2022-10-21 15:37:40.019462710 UTC
Provided expiry: 2022-10-21 15:32:08.119 UTC
Local replica time: 2022-10-21 15:32:10.019464171 UTC

How do I interpret that?

nomeata · October 24, 2022, 8:25am

It looks as if the agent signs the signature request for the state read (i.e. for polling the response) when it makes the call, and when the call takes a while (e.g. a loop around await) does not extend the expiry in the read request, and after a while it expires.

TL;DR: possibly a bug in the agent

cryptoschindler · October 24, 2022, 9:31am

Thanks Joachim! I can confirm that calling it from dfx indeed works. Tagging @kpeacock because I think he’s the owner of the agent.

kpeacock · October 24, 2022, 4:11pm

Oh, that’s an interesting theory - I’ll see if I can reproduce this with a new test

bitbruce · November 1, 2022, 1:57pm

When developing complex projects, a number of internal asynchronous functions may be refactored to achieve functional reusability. For example

private func fun1() : async Nat{
    await ledger.transfer(..)          //cross-canister call
   //...
};
private func fun2() : async Nat{
    // ...
    await fun1();
};
private func fun3() : async Nat{
    // ...
    await fun2();
};
private func fun4() : async Nat{
    // ...
    await fun3();
};
private func fun5() : async Nat{
    // ...
    await fun4();
};
public shared func run() : async Nat{
    ignore fun5();
    //...
};

The problem is that when 100 users call the canister at the same time, the number of Output Message Queues may exceed the limit of 500, and some await calls may be trapped.

My views:

The await call is classified as outcall and innercall, and the innercall should not be so restrictive.
Optimise the await role of private function calls. As in the above example, it is sufficient to make await ledger.transfer(…) act asynchronously, ignore other await.
An effect like this.

private func fun1() : Nat{
    await ledger.transfer(..)           //cross-canister call
   //...
};
private func fun2() : Nat{
    // ...
   fun1();
};
private func fun3() : Nat{
    // ...
   fun2();
};
private func fun4() : Nat{
    // ...
    fun3();
};
private func fun5() : Nat{
    // ...
    fun4();
};
public shared func run() : async Nat{
    fun5();
    // ....
};

OR,

private func fun1() : inner async Nat{
    await ledger.transfer(..)        //cross-canister call
    //...
};
private func fun2() : inner async Nat{
    // ...
    inner await fun1();
};
private func fun3() : inner async Nat{
    // ...
    inner await fun2();
};
private func fun4() : inner async Nat{
    // ...
    inner await fun3();
};
private func fun5() : inner async Nat{
    // ...
    inner await fun4();
};
public shared func run() : async Nat{
    ignore fun5();
    //....
};

Allows futures to be returned in a synchronous function and saved in a global variable. For example:

private var f : ?Future<Nat> = null;
private func fun() : Nat{
    // ...
    f :=  ?ledger.transfer(..);          // no await
    //...
};

bitbruce · November 7, 2022, 1:41am

There is a new solution: add the countAwaitingCalls() method inside the ExperimentalInternetComputer Module so that Canister can limit the entry of new requests.

PaulLiu · November 7, 2022, 8:33am

In your original example, each call uses await, which means it will schedule an outgoing self call message, and end the current call. So the total number of outstanding messages do not increase.

It is only when you use ignore fun2() for example, it will schedule more than one outgoing calls.

Edit: I should add that calls like await fun2() will also reserve resources (e.g. place in the input queue) to make sure when fun2() returns a value, it will be processed. So nested await calls do consume more resources than a single one.

bitbruce · November 7, 2022, 9:29am

Yes, Output messages are accumulated whenever an ignore funN() is present in the call chain. This becomes uncontrollable when many users access it at the same time.

derlerd-dfinity1 · November 7, 2022, 1:57pm

Hello everyone,

I talked to some people today to collect some recommendations on how to handle issues with too many outstanding messages filling up queues. Two general recommendations to be followed when aiming for scalable dapps that call other canisters came up quite consistently:

Make sure that the dapp maintains a counter or something similar on how many outstanding requests it has, and explicitly handle the situation where too many calls would be in flight at the same time. @roman-kashitsyn agreed to follow up with details how this is (plannded to be) done for the ckBTC canister.
If the design of the dapp allows to batch together some of the calls to external canisters, aim to batch them together. For example, if there are multiple calls to the ledger involving the same account, it might be possible to batch them together and only do one transfer.

There are also certain things that the IC protocol could do differently. The things identified here seem to be in line with the suggestions already brought up earlier in this thread. However, I want to stress that these measures will not really help with scalability as these would only bump limits by an order of magnitude or even less, but limits would still be easily hit as soon as, say, 1000 instead of 100 users try to do something. These things are:

Investigate whether it can be made easier to make nested function calls in Motoko without accumulating reservations, or whether there is an alternative pattern one could use. @PaulLiu already provided some pointers above, and @claudio agreed to follow up on details on this and what could be done.
Calls to self are currently treated in the same way as calls to other canisters. This means that there is a reservation for the response made in the input queue and in the output queue, which means that for calls to self effectively only have half the queue capacity available. This item is already in our list of backlog tasks and we will look into whether this can be picked up soon.

Topic		Replies	Views
Understanding Canisters Live on the IC Developers	2	59	January 3, 2025
[Discussion] Approaches for preventing canisters from hitting memory limits Developers Discussing	5	610	October 5, 2022
Signature queue for key ecdsa:Secp256k1:key_1 is full Developers	16	256	August 2, 2024
What happens when 1000 users call the same canister method concurrently on the Internet Computer? Rust Discussing	4	122	May 21, 2025
Can some one do a YouTube video on optimizing cross canister calls? Developers	3	565	February 2, 2023

Canister Output Message Queue Limits, and IC Management Canister Throttling Limits

Related topics