Subnet capabilities for scaling database services

As multi canister scaling systems become more and more popular, there are some concerns on my part about subnet allocation. When a canister creates another canister, how is the subnet assigned? Is the spawned canister always under the same subnet? For systems that need to communicate very quickly between canisters, having the deployed canisters on the same subnet is a must have, as cross subnet calls can take many seconds to complete, which for a cross canister db would be detrimental to speed for queries. Maybe someone here could provide some clarity or links to documentation about how canisters are allocated to each subnet, and if there is a way to provide functionality where we could ensure the canister deployed is on the same subnet. Another question, if the canisters deploy to the same subnet if possible, what happens when the subnet is at max capacity?

4 Likes

I think when a subnet reaches max capacity it just splits in 2.

If this is the case, there is no actual guarantee that a given multi canister setup would continue to work at optimal same subnet speeds, right?

1 Like

Yep. Hopefully before that time comes there will be cross subnet queries.

Is there any documentation or reference you can provide for this?

1 Like

From the link JaMarco posted above:

“The Internet Computer also ensures the transparency of subnets in other ways. The NNS can split and merge subnets, for example, in order to balance load across the overall network. This is also transparent to the hosted canisters.”

(Accompanied by this image)

@diegop (pinging you b/c I’m not sure who to ask on this one)

Do you know anything about if the functionality mentioned above has actually been implemented?

The reason why I ask, is that for applications that need to auto-scale themselves, it would definitely impact the performance of any inter-canister calls between existing canisters for that specific application. Why not just spin up a new subnet instead of splitting an existing one?

One of the features I’ve been pondering on is the ability to seamlessly scale beyond a subnet, and for a caniste on subnet A to be able to create new canisters on subnet B, and continue creation on subnet B until that fills up and then use subnet C, and so on…

The requirement that the canister which is instantiating the creation of new canisters be located on the same subnet is a bit limiting, and I’m not 100% sure why this was put in place in the first place. We already have the random “round-robin” approach of picking a subnet to create a canister on documented here, so I’m not quite sure why this restriction exists in terms of letting a canister do this directly (both have the ability to pick a random subnet, and to choose the exact subnet).

If DFINITY is concerned about people deploying to “private subnets” like the Bitcoin integration, the dev testing subnet, or to “private” subnets reserved for larger, specific applications, then IC management canister related actions like create_canister on those subnets can be gated by the principal id of the developer or of the canister (i.e. if it is already on that subnet).

I’m making the argument for this feature, because if it exists, then it will become 10 times easier to scale applications past a single subnet, and DFINITY can more easily say/advertise that applications on the IC are infinitely scalable, or at least until they fill up all the node memory on the IC :sweat_smile:.



On a separate note, how much total node/subnet memory capacity currently exists on the entire IC?

It looks like there are 35 subnets in total, of which:

  • 1 subnet is the NNS → 78.21 TB
  • 1 subnet is the SNS → 66.48 TB
  • 2 subnets are “System” subnets, of which one is 25.42TB and the other (System II) is 54.75TB
  • The 31 remaining subnets are all “application” subnets, which are the type of subnet available to all of us devs. All application subnets look to have 25.42TB of memory

25.42TB in an application subnet? But I thought each subnet has a canister memory capacity of 300GB…?

Does anyone have an answer to why this is?

2 Likes

300 G is the stable memory capacity limitation.

You can scale ur dapp by creating canisters in the random subnet instead of the same subnet.

Do you know anything about if the functionality mentioned above has actually been implemented?

No, this functionality is not implemented. We’re still in the design phase when it comes to subnet splitting, so it will take a while longer until it’s available.

The requirement that the canister which is instantiating the creation of new canisters be located on the same subnet is a bit limiting, and I’m not 100% sure why this was put in place in the first place. We already have the random “round-robin” approach of picking a subnet to create a canister on documented here , so I’m not quite sure why this restriction exists in terms of letting a canister do this directly (both have the ability to pick a random subnet, and to choose the exact subnet).

There are some historical reasons for this, mainly out of caution before launch since it was easier to limit canister creation on the same subnet. We’ve had discussions in the past about extending the api for create_canister to allow for choosing a subnet but there were concerns about an api where you set some specific subnet id because that would mean that the concept of subnets needs to be a first class citizen which sort of goes against our efforts to “hide” that concept from end users as much as possible (e.g. in the interface spec we rarely, if at all, talk about subnets). As such we didn’t reach an agreement and given there was no real use case yet we left things as they are. Given what you describe, it might be a good idea to pick up this discussion again.

If DFINITY is concerned about people deploying to “private subnets” like the Bitcoin integration, the dev testing subnet, or to “private” subnets reserved for larger, specific applications, then IC management canister related actions like create_canister on those subnets can be gated by the principal id of the developer or of the canister (i.e. if it is already on that subnet).

I don’t think this is a concern anymore. There is a mechanism to disallow canister creation on special subnets if needed (the authorized subnetwork proposals you might have seen from time to time).

25.42TB in an application subnet? But I thought each subnet has a canister memory capacity of 300GB…?

The limit was indeed 300GiB but we have increased it to 350GiB not so long ago. You can always consult this to know what’s the latest value.

As for what’s shown in the dashboard: it’s the total available disk space on the subnet which is why you see different numbers for different size of subnets (I mean subnets with different number of nodes). The disk space allocated on the nodes is more but because of current protocol limitations (that we are also working to lift some time in the future) we need to enforce the lower limit of 350GiB. It can be misleading (apparently, since you’re asking) so it’s probably a good idea to clarify what is shown there. I can take it up with the team.

4 Likes

But the application would be slow if it has to make cross-subnet calls, right?

yes, if you can ensure the tx will be processed successfully and needn’t read the call back result, you can use the ignore update_call to optimize the canister process speed.

I’m not sure I understand.

For this subnet, can you explain what the difference is between “State” and “Memory”? Is “State” simply the total heap + stable memory of all canisters in that subnet?

Also, does SUBNET_MEMORY_CAPACITY refer to stable memory only, or does it also refer to wasm heap?

Sorry for the late reply, I was on vacation.

For this subnet , can you explain what the difference is between “State” and “Memory”? Is “State” simply the total heap + stable memory of all canisters in that subnet?

Yes,“State” is essentially what is consumed by the canisters. It’s including the heap and stable memory, but also wasm modules, global variables and messages in the canisters queues. “Memory” is the sum of all disk space on the nodes on the subnet. As I said in an earlier message, this can be misleading (and honestly not particularly useful anyway) and I’ve already contacted the team that maintains our dashboard to get it removed as the value is not really that much.

Also, does SUBNET_MEMORY_CAPACITY refer to stable memory only, or does it also refer to wasm heap?

Nope, it’s not just stable memory, it refers to the things I mentioned about “State” earlier and it’s the max value that the “State” can ever get to.

2 Likes