How many ingress and inter-canister messages can a subnet process?

How many ingress and inter-canister messages are subnets processing per block and per second right now?

This obviously varies by subnet (Fiduciary, NNS, application, european), but I’m designing a new service that will receive intra-subnet messages (all canister to canister calls made on the same subnet), and am trying to get a ballpark of how many messages could be sent before performance would start to degrade.

My intention with asking this question is to make sure that this service doesn’t use up significant subnet resources (uses 1-5% of subnet resources maximum).

3 Likes

Subnet fuqsr is processing up to 1.4k updates per second (most of them heartbeats, but from the point of view of execution throughput, this is not fundamentally different from canister messages). Subnet bkfrj OTOH is processing a maximum of about 140 updates per second and it’s also running at full tilt (notice how the block rate drops every now end then).

So if you’re looking at raw throughput in terms of almost zero instruction messages, you can expect a subnet to execute over 1k updates. If your messages are doing a lot of work (e,g, taking the extreme example of an infinite for loop), it’s possible for a subnet’s throughput to be lower than 1 message per round (you have 4 virtual CPU cores, but a single message execution can take 5(?) rounds to complete).

So on the one hand, if you want to limit the load you put onto a subnet to 1-5%, you should limit yourself to 15-70 messages per second. But OTOH you don’t want to be using more than 1-5% (and probably less) of 7-8B instructions per round.

2 Likes

Let’s say I have a canister that receives messages from other canisters on that same subnet (simple computation) and then appends them to a log.

Are there any subnet scalability improvements that would increase the rate of messages that can be received and appended by an order of magnitude (from 1k to 10k msgs/second)? For example, could hashed block payloads have any impact on this via the block rate?

Hashed block payloads have nothing to do with the scenario you describe. For one, all those canister messages would be local to the subnet, so they don’t need to go through a block. (Which is why bandwidth between local canisters is orders of magnitude higher than cross-subnet, for the time being.)

But more importantly, the bottleneck here is message execution, not induction (getting messages into a canister’s input queues). There is overhead associated with each message execution: the message is popped from the input queue, a call context is looked up or created if necessary, cycles are reserved, memory usage and available queue slots are checked, the message is serialized and sent to the respective canister sandbox process, the sandbox process deserializes it processes it and serializes the output, the output is deserialized by the replica, call context is updated or dropped, the response is enqueued, cycle fees are computed and updated, changes to the canister are persisted or rolled back. Actually, I guess the biggest overhead is the fact that in order for it to be possible to roll back a message, the canister state must be cloned beforehand; rolling forward then means keeping the mutated state; rolling back is keeping the original state.

Regardless, simply executing one update is expensive. Which is why the highest throughput I’ve seen is a bit over 1k messages over 4 virtual cores per round. Or just over 300 messages per core. I.e. a single canister can’t even process 1k messages per round.

One approach I’ve seen for a scalable ledger is to have a bunch of frontend canisters for load balancing. These canisters batch together a reasonable number of transactions and send them as a single request to the ledger. Which then has fewer transactions to execute, so the overhead is significantly lower and the ledger can spend a larger proportion of the round’s instructions actually doing useful work rather than setting up and tearing down a new transaction.

3 Likes

If you’re referring to the high performance ledger project, I believe that scalability comes from the aggregator (load balancing) canisters being on different subnets, moreso than having multiple aggregators on the same subnet, although I guess you could raise the compute allocation and reserve 3 cores at once.

When talking about a single subnet, due to the number of cores per subnet I believe you’d want canisters to send messages to a small number of canisters (1-4) on that subnet, right? For the sake of this example, let’s say 1k canisters are sending a message to one canister on the same subnet. That one canister is already loaded into the sandbox, so it only needs to be spun up once and can then process all the messages in it’s ingress queue.

So in this example, the message receiver canister would be fairly efficient and not contributing significant load, but the individual canisters doing work and sending out the calls to the receiver canister would be putting significant load on the system on account of all of the various bottlenecks & overhead you mentioned (sandbox processes, cloning of canister state, etc.). Or do these same overheads apply to processing many messages in the same execution round, such as needing to re-clone the state after every ingress queue message is processed?

1 Like

The scalability comes from batching the transactions. It doesn’t really matter whether it’s one or 4 aggregators per subnet (although presumably much more than 4 per subnet wouldn’t really help). It is after all the single ledger canister that is the bottleneck.

Yes, they do. Every single message execution is a separate transaction (with quite a bit of state changes besides the heap, plus checks and validations, on the side). Which is why the aggregators are required for the HPL. It’s not that they do a lot of work (maybe some basic validation), they simply batch token transactions so the canister transaction overhead is only “paid” for a batch of token transactions, not to every individual token transaction.