Increasing subnet storage capacity and introducing resource reservation mechanism

ulan · October 2, 2023, 6:42am

Subnet storage capacity increase

The replica version 76fd768b increases the subnet storage capacity from 450GiB to 700GiB. This is the first milestone in a larger effort to push the subnet storage capacity into the terabytes.

Historically, the IC nodes had 3.2TB of NVMe space available at Genesis. Since a node may store multiple versions of the replicated state, the subnet capacity was conservatively limited to 450GiB. Besides that, more optimizations were needed in the state manager to scale beyond that limit.

Recently, the node providers in collaboration with DFINITY upgraded the nodes to increase the available NVMe space by 10x, from 3.2TB to 32TB. In parallel, DFINITY engineers have been working on performance optimizations in the state manager. All this work made it possible to increase the storage capacity to 700GiB. There are more potential state manager optimizations, which would allow further increases of the storage capacity, such as, for example, the rewrite of the storage layer to replace the XFS reflink operation during checkpointing.

Resource reservation mechanism

We took the newly added 250GiB as an opportunity to start tackling a long-standing problem with the storage charging mechanism. As you might know, currently canisters pay for storage dynamically every few minutes following the “pay-as-you-go” model. Such a fine-grained payment is convenient for developers, but at the same time it doesn’t handle spikey usage patterns well.

Consider a scenario where someone allocates the entire subnet storage for a few hours and pays only for those hours. The end result is that during those hours, the operation of other canisters on the same subnet might be disrupted as they might fail to allocate new storage. The problem here is that the cost of such a spikey usage is low due to the “pay-as-you-go” model.

As a mitigation, the newly added 250GiB will be subject to a new resource reservation mechanism designed to encourage consistent, long-term use and discourage sudden, large spikes in storage usage. The mechanism is enabled only on application subnets, because verified application and system subnets do not need such protection since malicious actors cannot deploy canisters there.

The reservation mechanism works as follows:

As long as the subnet remains under the previous limit of 450GiB, the storage payment remains the same as before, following the “pay-as-you-go” model. This is the case for all subnets as of this writing.
When the subnet grows above 450GiB, then the new reservation mechanism activates. Every time a canister allocates new storage bytes, the system sets aside some amount of cycles from the main balance of the canister. These reserved cycles will be used to cover future payments for the newly allocated bytes. The reserved cycles are not transferable and the amount of reserved cycles depends on how full the subnet is. For example, it may cover days, months, or even years of payments for the newly allocated bytes. It is important to note that the reservation mechanism applies only to the newly allocated bytes and does not apply to the storage already in use by the canister.

The goal of the reservation mechanism is to discourage the spiky usage pattern by making it more expensive while at the same time keeping costs the same for long-term users. As explained above, the reservation mechanism activates only above the previous subnet storage limit of 450GiB.

Details of the changes

A new field named reserved_cycles is added to the canister state. The new field keeps track of the reserved cycles that were set aside from the main balance of the canister and were not spent yet. The new field will appear in the result of canister_status call to the IC management canister.
A new field name reserved_cycles_limit is added to canister settings. It serves as an upper limit on reserved_cycles. Setting the limit to 0 is a way to disable the reservation mechanism for the canister. Such opted-out canisters would not be able to allocate from the newly added 250GiB, which means that these canisters will trap if they try to allocate storage when the subnet usage grows above 450GiB. In other words, such canisters will effectively operate under the previous subnet storage capacity of 450GiB. The default value of the new reserved_cycles_limit is 5T cycles.
Storage allocation operations such as memory.grow, stable_grow, stable64_grow, setting memory_allocation are adjusted to move cycles from the main balance to reserved_cycles. The amount of cycles depends on how full the subnet storage is: code. The amount of cycles per byte grows linearly with the subnet usage. It starts from 0 at 450GiB usage and goes to ~10 years worth of storage fees at 700GiB. These config parameters are up to discussion and may change after the community feedback.
The periodic charging for storage is adjusted to first burn cycles from reserved_cycles and only when that reaches 0, to start using the main balance.
The freezing threshold computation is also updated to take reserved_cycles into account. This means that even if the main balance is below the freezing threshold, the canister may still be functional if it has enough reserved_cycles.

Next steps

Normally, changes in the protocol first go through the community discussion, then the NNS proposal, and then roll out to the mainnet. In this case, because of the possibility that this vulnerability of the protocol could be abused, we followed the approach for security fixes, where the fix is deployed first and then discussed later. Note that there are no changes for the first 450GiB that were available before. The new mechanism activates only above that threshold.

The plan:

Implement and deploy the reservation mechanism on the mainnet [done].
Implement the canister status and settings changes in dfx and agents [WIP]. A new version of dfx will be released soon.
Discuss the reservation mechanism and other solutions [we are here now].
Address issues raised by the community.
Submit an NNS motion proposal.
If the proposal is rejected, then we need to find and implement an alternative solution.
If the proposal is accepted, then update the protocol specification.

What does it mean for you?

If your canisters are on a verified application or system subnet, then you don’t have to do anything because the reservation mechanism is disabled on such subnets.
If you want to opt out, then you can set a new reserved_cycles_limit canister setting to 0 for all your canisters. In that case, the previous limit of 450GiB will apply to your canisters.
Discuss the problem and the reservation mechanism. Your feedback would be very much appreciated.

ulan · October 4, 2023, 7:43am

Implement the canister status and settings changes in dfx and agents [WIP]. A new version of dfx will be released soon.

This is available in dfx 0.15.1-beta.1. Thanks @ericswanson for implementing it.

peterparker · October 4, 2023, 1:08pm

Looking forward to even memory

No stupid question: verified application or system subnets are those listed on the dashboard right?

Severin · October 5, 2023, 6:51am

The subnet overview on the dashboard says which subnets are which type. This list is exhaustive.

IIUC this means ‘yes’ to your question

ulan · November 14, 2023, 1:23pm

I received one feedback internally about the following scenario:

A canister allocates N bytes
The canister deallocates N bytes
The canister allocates N bytes again

The second allocation is going reserve cycles regardless of the cycles reserved by the first allocation. In other words, the system doesn’t keep track of individual allocations and their reservation times (which would be inefficient and memory consuming). Each allocation is considered independently from previous allocations, deallocations, and reserved cycles.

Note that currently there is no easy way to deallocated storage except for uninstall and re-installing the canister, so this would be more of a corner case rather than a normal case.

ulan · November 24, 2023, 9:56am

The NNS motion proposal: Proposal: 126094 - ICP Dashboard

ulan · November 28, 2023, 2:27pm

The spec change: Specify the resource reservation mechanism by ulan · Pull Request #257 · dfinity/interface-spec · GitHub

icme · November 15, 2024, 7:49pm

It looks like https://dashboard.internetcomputer.org/subnet/3hhby-wmtmw-umt4t-7ieyg-bbiig-xiylg-sblrt-voxgt-bqckd-a75bf-rqe is on pace to hit the 450GB resource reservation mechanism threshold within the next 30 days.

Is this threshold going to stay at 450GB, or is it going to be raised as subnet capacity increases?
(capacity was previously 750GB, is now 1TB I believe)

dsarlis · December 13, 2024, 11:55am

Apologies for the delay here. I just noticed the question.

The current threshold was set with the increase to 1TB in mind, so, no, I don’t think we’re going to propose to adapt it for now. We can consider bumping it, as we further bump the total capacity in future iterations.

timo · March 1, 2025, 4:42pm

The OP is a very good description of the feature. But is it also documented elsewhere in the same level of detail, for example in the developer docs?

I do have some questions remaining.

The reserved cycles are used to cover future storage payments. But is really as stated, that only storage cost for newly allocated bytes comes out of the reserved cycles balance and storage cost for previously allocated bytes does not? I doubt it is that way because it would contradict this quote:

Hence I am asking for clarification. Isn’t it rather that all storage costs are taken from the reserved cycle balance until that balance is empty?

Second question:

Is there any way for the canister to know what the current subnet usage is? Or what the cost per byte is before making the new allocation?

dsarlis · March 2, 2025, 8:19pm

Hence I am asking for clarification. Isn’t it rather that all storage costs are taken from the reserved cycle balance until that balance is empty?

Right, this is correct. Any storage cost that needs to be charged to the canister, will first take from the reserved cycles, then continue draining from the main balance (if the reserve cycles are exhausted).

Is there any way for the canister to know what the current subnet usage is?

The best would be to make an https outcall to get subnet metrics from the subnet’s state tree and look at the canister state size. We have also discussed exposing a canister management endpoint where canisters can look up this info (but we haven’t prioritized it).

Or what the cost per byte is before making the new allocation?

This does not exist either. Maybe it could be added as an additional system api here. However, I’m not sure how useful it would be as if we’re talking about heap, it’s probably difficult for canisters to predict how many bytes an operation would take (it depends on the language/compiler/memory allocator which you don’t necessarily have control or knowledge of). For stable memory it could be perhaps more useful as canisters can directly call the relevant APIs.

Topic		Replies	Views
Question regarding RE EXC-1168: add non-subsidised storage cost on 20+ node subnets (behind the flag) Developers	16	2066	October 30, 2022
Subnet storage utilization is increasing. What's next? Developers	4	89	January 28, 2025
Long Term R&D: Storage Subnets (proposal) Roadmap	44	5351	January 3, 2024
Subnet capabilities for scaling database services General	13	1114	July 13, 2022
[BUG] compute_allocation & memory_allocation freezes canister Developers	6	593	July 12, 2021