Fixing incorrect message memory fee

TL;DR

Message memory is currently not appropriately charged for. We propose a charging model that fixes this while only minimally impacting typical workloads.

Background

Whenever a canister produces a message for another canister, i.e., a canister-to-canister message, it has to be stored in the replicated state until it is delivered and picked up for processing by the recipient. Message memory is a shared resource among all canisters on a subnet. It is different to other canister related types of memory – like heap or stable memory – because messages are ephemeral by their nature, and, in the current implementation of the protocol, messages are held in RAM at all times and serialized/deserialized upon every checkpoint. Also messages in subnet-to-subnet streams have to be certified every round. Naturally, this means that message memory is more resource intensive than other canister related types of memory. The pricing for message memory, however, is currently the same as for other canister related types of memory.

The problem

As described above, the reasoning behind the current pricing for message memory has a major flaw: message memory should be considered much more valuable than other types of subnet memory because it is scarcer and additionally has a bigger relative impact on replica and subnet performance.

The proposed fix

Similarly to the previous case of fixing the incorrect compute allocation fee, we would like to address the flaw in message memory pricing.

We target a model that should have minimal impact for typical workloads, while still leading to a reasonable fee for utilizing big chunks of message memory for prolonged periods of time.

Concretely, we propose to charge 0.125 cycles per byte per second (or approximately 250K cycles per message per second).

The impact

For subnet-local inter-canister calls, which complete within the same or the next execution round, the impact would be minimal. For actual cross-subnet calls that complete within 10 seconds (the median cross-subnet roundtrip time is 5 seconds), the additional cost would be around 2.5M cycles, which is the same as would be charged for an additional 2.5 KB of payload.

The rollout plan

If the NNS proposal to elect a (to be implemented) replica binary revision which uses the new price for message memory and charges accordingly is accepted, the change will be rolled out to all subnets via subsequent upgrade proposals.

11 Likes

Admin Note:

I pinned the topic at the user’s request and because this would affect developer experience

6 Likes

This is effectively a 10x increase on subnet-to-subnet calls relative to the current pricing model. The scaling pattern for applications on the IC requires subnet-to-subnet calls.

For reference, the way Quark was designed, according to the suggested scaling patterns, this would translate to a 10x increase in overhead. The IC is an actor based system. Messaging must be cheap.

Aside: We decided not to ship Quark out of fear something like this could happen. And, a lack of critical tooling (reporting, logging).

  • What was the line of thinking with the existing pricing model? Following this change, are you sure ingress messages are priced accurately?

  • Developers need to understand better why this is presently causing issues. Or, at least how this could cause issues in the future. Are canisters maliciously saturating this memory pool today?

  • A potential unintended consequence of this I see if punishing applications for “working” in periods of high load. What is the max time a message will wait in the outbox? Do you have plans to provide mechanisms to canisters to introspect on the current load level? For example, consider how this could be used by a larger project to stifle competition from a much smaller project.

  • Similarly, how can developers expect this pricing to scale on different subnets.

I won’t be following up here. But, this is a huge change I couldn’t ignore. I hope this creates come curiosity in the community and the conversations continues.

David, I hope the foundation treats you to a nice dinner for drawing the short stick on this one :slight_smile: .

9 Likes

Thanks a lot for the constructive feedback. Let me try to answer your questions below.

I guess this 10x is neglecting the xnet_byte_transmission_fee and the update_message_execution_fee. So when solely considering the call fee it would be roughly a 10x increase, yes. However, note that the caller has to pay the update_message_execution_fee twice (once for executing the message that triggers the call and once for the response; ~ 1.2M) and the transmission fee of 1000 cycles per byte which can go into the billions for larger payloads. When also considering these fees, it is a 2x increase in the worst case for tiny messages. For messages with bigger payload the increase will be more like a fraction of a percent. Subnet-local messages (most of the messages that we currently see on mainnet) will be affected even less than that.

The initial model was based on experiments using a synthetic workload. It correctly captures that transmitting is expensive but unfortunately misses that reserving/consuming the memory for a period of time is also expensive. This is what we propose to fix here.

Based on our current understanding, yes. Ingress messages are different in nature as they will time out after 5 minutes if they are not consumed by the canister. So there is a hard upper bound based on the maximum block size and the time to live.

Currently there is no indication for abuse but we propose to address this before it actually becomes an issue. If a resource is not appropriately priced this could potentially be abused in the future.

There is a timeout for requests that are still sitting in output queues (i.e., weren’t routed into subnet-to-subnet streams because of congestion) which is currently 5 minutes. If this doesn’t sufficiently address the problem there are various options one can consider, e.g., the one you suggest where users can somehow directly or indirectly check the current load.

Also remember that subnet-local messages will very likely not be affected by this change too much even in periods of high load.

Currently the plan is that it would linearly scale with the number of nodes in the subnet (similar to the other fees).

9 Likes

For context, over the past 24 hours just under 170M messages were executed. Of those, around 143M were canister messages (requests and responses). Of these 143M, 130M were routed directly by Execution to local canisters, without making it into subnet streams. An additional 3.5M were routed to the loopback stream (also subnet-local). Leaving some 9.5M actual cross-subnet messages. So only about 6.6% of canister messages were cross-subnet.

Furthermore, of the 13M messages that got routed into (loopback or remote) streams, the median roundtrip time (including scheduling and execution) was around 6.5 seconds; the 90th percentile was just over 8 seconds. Which would mean 6.5 * 2e6 * 0.125 = 1.625M cycles median charge for the reservation; and 2M cycles 90th percentile charge.

Considering that the median canister message payload is around 90 bytes (and ignoring execution costs), it currently costs 260K + 2 * 590K + 2 * 90 * 1K = 1.52M to send such a message; for the 90th percentile payload size of 360 bytes, the current cycle cost is just over 2M cycles.

So it is actually the case that for cross-subnet messages the costs would approximately double for the average (small) canister message. But, as per the above, this only covers 6.6% of all canister messages. For the remaining 90%+, the additional costs would either be zero (if the call completes in the same round) or 250K cycles (if executed in the next round). That’s an increase of 0-16% for the average subnet-local canister call.

10 Likes

This likely has a significant effect on ORIGYN in a couple of ways:

  1. When users publish media to stage NFTs they go through a gateway canister that holds all the permission logic and is relayed to a storage canister. This canister can be on another subnet by design. It would be good to know the additional cost of uploading 2GB of data that has to cross a subnet boundary as this is the functional unit we are negotiating with our clients on and we already have pending contracts based on current experiments and pricing.

  2. We’re a good way through building an IC-wide generic event messaging system. I’m guessing this is going to make things a lot more expensive. We’ve discussed trying to get ‘satellites’ on each subnet so that the message only has to cross a subnet boundary once, but we were hoping to wait for v2 to do that. This will likely delay things further as broadcasting a message 100 times across a subnet boundary just isn’t going to work with the increased pricing. This would be/will be much easier when we can procure a canister on any subnet of our choosing.

4 Likes

According to my computations the byte transmission fee (which we already have today) should be the dominant part in the workflow you describe. Transmitting 2GB of data will cost you roughly 2T cycles (1000 [transmission fee per byte] * 2 * 1000^3 [number of bytes]). Assuming (pessimistically) that the 2GB will be reserved for the entire time and that it takes you 1000 seconds to transfer these 2GB to another subnet then the cost for message memory will be roughly 0.25T (0.125 [cycles/B/s] * 2 * 1000^3 [number of bytes] * 1000 [time in s]), likely even below because not the entire memory will be reserved for the whole duration and it will likely also be a bit faster. This would be an increase of 10% in the worst case.

Also note that this back of the envelope calculation is neglecting the costs for ingress messages for simplicity. If we’d also account for these then we’d be quite a bit below the 10% in the worst case (around 3%, I think).

More generally speaking (but somewhat out of scope of this thread) it would seem to be a good idea to consider alternatives to sending so much data via XNet. Given that there is a client application doing the upload involved anyways it would be more efficient if routing of the big ingress messages to the canister that stores it is done on the client side and the XNet calls are limited to negotiating the access rights etc.

For this case our analysis above should apply: for tiny calls 2x in the worst case; for bigger ones much less. I agree that what you describe as v2 sounds even better in that respect.

Finally please also note that we are still discussing whether we can further restrict the situations where the protocol charges the new fees to situations that are particularly expensive for the system. If we are able to come up with something this would likely minimize the impact even further.

3 Likes

This makes sense…thank you for laying it out like that.

This is on our road map for sure…just trying to keep things simple for now.

Great…good to know. Thanks for all the updates and explanations.

3 Likes