Voting is now live for a new proposal for the scaling of the world-computer

levi · November 19, 2024, 1:40am

Hi world-computer community,

A new proposal for the scaling of the protocol is now live!

Dashboard link: https://dashboard.internetcomputer.org/proposal/134203.
NNS link: https://nns.ic0.app/proposal/?u=qoctq-giaaa-aaaaa-aaaea-cai&proposal=134203.

The proposal summary contains the specifics. More background and conversations are in these forum posts: Path forward for subnet splitting and protocol scaling and Suggested measures to reduce latency and improve ICP scalability - #3 by levi.

Let’s vote.

:Levi.

skilesare · November 19, 2024, 2:41am

Cross-canister calls can always take up to the time of a cross-subnet canister-call: 3 seconds. Sometimes the protocol might be able to do it faster but from a canister-creator’s point of view, a canister-call can always take up to 3 seconds.

This is a non-starter and I’ll vote to reject. This can in some instances represent a 22x slowdown of existing applications. Why would we nerf the system this way? I’ve personally spent hundreds of thousands of dollars developing systems that are only possible if I’m able to operate on a know subnet with other canisters that participate in the specified protocol. Cross subnet is handled at the software level.

I appreciate the thoughtfulness but disagree with prejudice on the conclusions.

I’ll put together some more thoughtful comments when I have time, but they will mostly revolve around maintaining the premissionless nature of being able to put what I want where as long as I’m will ing to pay the carrying cost. Small bootstrapped operations need to be able to operate with some wiggle room in peek performance scenarios to demonstrate viability. Once they catch on it should be expected that they will pay full freight and rent the amount of a subnet that they will consume.

levi · November 19, 2024, 6:02am

Can you share a specific existing application and its architecture?

Can you share which specific systems you have in mind that wouldn’t be possible and what part of it wouldn’t work here? And how do you plan to handle the fact that if anyone can install and run canisters on any specific subnet at any time, then any subnet that you try to run your systems on can always fill up and get overloaded at any time?

Being able to take up space that others are in the middle of using is not a positive quality.

This proposal makes sure that each canister is stable in its availability and liveness, no matter what other canisters are doing. For the security of a canister’s-space.

tiago89 · November 19, 2024, 6:50am

I also voted reject.

Levi, a subnet is a blockchain of itself you want to split it and merge like it was “easy” load balancing?

I am not that technical, but I doubt it is that easy / computationally cheap. I think I prefer the option of the developer to “easily” migrate their specific canister(s) to other subnet. Or have an option and side budget that allows to make that “migration” automatic when performance degrades.

Because, that need and cost, might not exist in other canisters, and they may value more to stay in the same subnet as other canisters.

Also, what about spikes, you would have all the cost of splitting, to a few days later be doing all the cost of merging back?

Also the subnet topology and risk mitigations that each have, if you are auto splitting and merging to the next most available ones, won’t they risk to not be “guaranteed”?

I personally, and from the limited information/context I have, prefer to mitigate this problem in other ways, namely adjusting price, more subnets, easy migration and copying to other subnets (blockchains). Which I feel is already what Dfinity is going into.

Lorimer · November 19, 2024, 7:39am

Subnet splitting has been planned for a while (but it’s complicated). This particular proposal seems to make too many assumptions, and suggests that developers should have no control over the exact subnet being used. Why can’t that control just be optional (to benefit from better auto-scaling, where feasible based on the dapp in question).

I think this proposal is a valuable contribution to the discussion, but I’m not sure why it needed to be submitted as a motion at this stage.

levi · November 19, 2024, 9:17am

Subnet splitting is already implemented in the protocol, it currently requires some manual steps, this proposal makes it automatic.

A subnet-split is when the subnet moves some of its canisters to a new subnet. The chain of the blocks of a subnet doesn’t split.

This proposal is the lowest network computational cost and most efficient way to balance the load of the subnets within the protocol.

Canisters need to move when a subnet gets overloaded. This proposal is that the protocol will handle where canisters move to and when. It will be much lower cost on the network resources than if you let canister controllers choose which subnet to move to at any time, since the protocol can control and coordinate the best way to move the canisters to make sure every canister has enough space.

The new subnets can stay for later load-spikes, making sure that canisters stay available/live during load-spikes. Merging subnets can be done later if there’s a need for it.

This proposal states that the canisters will stay in their chosen subnet-type, so they will always keep the same subnet security parameters.

The proposal states why that control cannot be optional:

If an application gets popular, and other applications that integrate with it try to be on the same subnet as the popular application, then the popular application might not be able to find a subnet with enough space to function if other applications keep following it around and right away filling up any subnet that the popular application tries to move to.

As the network grows, when a subnet gets full, all the applications on it would look for empty subnets to move to. They would all be looking for the same low-load subnet qualities and there’s a high chance they would all target the same few emptiest subnets to move to, and then the new subnets could get full fast since many applications are moving to the same few subnets at the same time, and some of the applications might have to move again, causing inefficiencies.

Letting canister controllers migrate their canisters to specific subnets would also make it easy for someone to make any application stop working by spamming that application’s subnet (denial-of-service), since anyone would be able to target any specific subnet. If however the protocol stays in charge of which canisters live on which subnets and when, then the protocol will spread any load throughout the whole network, and it will not be possible to target a specific subnet or specific application. As the network grows with more subnets, it gets more and more safe for canisters, and it gets harder to try to target a specific subnet or application.

tiago89 · November 19, 2024, 9:42am

Great feedback, thanks Levi. Wasn’t aware of a few things there.

I am now thinking that your proposal depends a lot on this “efficient” allocator of canisters, that will decide which canisters stay on which splitted subnet.

I think there is a 80-20 rule case then. I argue that all that we really need is an efficient/protocol level router of these canisters. That is used by default when creating new canisters, but also when subnets pass a certain threshold and becomes optimal to migrate a certain number of canisters to other subnets.

If you can have that, then a Dev even wants to activate it by default, so that their “group” of canisters, be automatically moved around if performance degrades.

What you seem to want (and I too) is that this “automatic router / migration” tool exist. That is smart enough to understand dependencies / inter canister calls and optimize for it to be on the same subnet (as much as possible).

I think, and agree, that should be the priority. But the rest of the changes, like not allowing to set subnet or forcing to move around or bigger usage of splitting/merging, maybe can be discussed after the auto router / migration tool is a reality.

Just my 2 cents ok?

hpeebles · November 19, 2024, 10:45am

This proposal would destroy OpenChat since we heavily make use of co-location for fast updates and composite queries.

For example, on every subnet we have a LocalUserIndex which knows a minimal set of details about every user. Then when a user wants to join a Group or Community they send the request to the LocalUserIndex for that Group/Community, the LocalUserIndex is able to validate the caller then forwards the request on to the Group/Community. So the Group/Community just needs to ensure the request came from the LocalUserIndex, simples.
As it stands, joining a chat takes ~2 seconds because everything can execute in a single round, if this involved cross-subnet calls then it would take > 10 seconds.

Similarly, when requesting chat updates, we group the chats we want updates for by subnet, then send the request for each subnet to the LocalUserIndex which uses composite queries to fan the requests out to the various chat canisters. Previously we would sometimes make 50+ queries and it would be really slow, whereas now we just make a single query per subnet.

There are many other scenarios in which we take advantage of co-location but I just wanted to give one example which involved updates and one for composite queries.

marcio · November 19, 2024, 1:20pm

I still believe we need a way to unify @levi’s idea with our current approach. Pricing the ability to co-locate might be a good starting point. As far as I know, public subnets can be filled without restrictions for free, making them cheaper than renting your own subnet. If all subnets are empty, the co-location price could logically be set at $0. On the other hand, if subnets are completely full, the co-location price should approach infinity.

marcio · November 19, 2024, 2:29pm

Expanding on this idea: for instance, I wouldn’t mind paying $1,000 to co-locate my single canister dApp close to my ledger. On the other hand, Yral might not be willing to pay the same amount to co-locate millions of canisters. Additionally, we’d likely need some form of ongoing co-location tax or fee to maintain your canisters in the subnet of your choice. Something that makes you think twice if you really want to stay in a busy subnet.

skilesare · November 19, 2024, 2:38pm

I’ll mostly just echo what @hpeebles said. It is a common architecture. ICRC-72 generalize the same concept for anyone to use for pub/sub.

And how do you plan to handle the fact that if anyone can install and run canisters on any specific subnet at any time, then any subnet that you try to run your systems on can always fill up and get overloaded at any time

This is the more important question as I’ll just take as a lemma that we want to support more architectures/application types than less. The answer is that if my subnet is attacked I want to be able to move or pay for my share. There is no free lunch. We’re making a free lunch for a while, but any significant economic activity that can be attacked will be attacked and the general rule for systems on the IC is that you must eventually be able to support the compute you require because you will eventually be required to support it. It is awesome that we spread things out for folks to bootstrap, but if your economic model and architecture isn’t taking into account paying to reserve the compute you need then you need to rework it. There are other alternative ways of setting this up that we should absolutely look at, and I think some of the ideas around segmenting the compute into different models is a healthy thing to look at.

Being able to take up space that others are in the middle of using is not a positive quality.

You are making an irrelevant statement here because this is not the case on the internet computer. There are currently multiple ways to reserve space and compute that others cannot take from you if you reserve it first.

Case: @rbole was rightly frustrated that his app stopped working on the EU subnet, but the reality was that he could have paid to reserve that space. Unfortunately the app, at this stage of development, couldn’t support paying for the space it wanted. Is this poor design? I don’t think so because he’s trying to build something awesome and needs to bootstrap the utility to get it to a point where it can support that level of financial commitment from an Open Internet Service DAO. This unfortunate circumstance is likely a case of the EU subnet being unique and not having alternatives. As there are more EU subnets this hopefully becomes less of an issue, but we’ll never get away from the fact that if you want your canister to always be able to respond in 1-3 seconds you must reserve 100% compute and that costs like $3,500 a month(Hopefully subject to moore’s law over time). There are software solutions to this issue, like the fact that you can up your compute in one call, then make your call, then downgrade your compute and it will be much less than $3,500. You’ll only pay at that rate for a short period of time. It is likely that personal wallet canisters will likely have to take this into consideration and have this feature built in so it ‘just works’.

I’d like to see the ability to place a bid with my ingress call to have the target canister prioritized in the next round. This eliminates reverse gas, but it increases flexibility and is not required. If the subnet becomes busy these fees could get very high and we’ll see issues just like every other blockchian and we’ll burn tons of cycles. If you pair this with a rational canister migration methodology(which you nicely layout) that makes it push-button easy to migrate when you want to we’re getting to a place where most of these issues are mitigated by the choices the network makes. There will be times where I will want to mark my canisters as ‘do not move’ or ‘keep with canister xxxxxxxx’ and the algo needs to take these into account. Some hybrid of what you’ve proposed and users choice will be where we likely end up.

I’m just seeing @marcio’s post. Her is the documentation on how to exactly what you are suggesting today on the IC:

See compute_allocation and memory_allocation. If you set these no one can take your processing power…you must pay for them though.

marcio · November 19, 2024, 2:58pm

Yes, that’s a step toward pricing co-location, but it’s not exactly what I was suggesting since co-location itself doesn’t currently have a direct price. The cost of co-location is effectively paid by the dApps already in the subnet, as their performance degrades. My suggestion can be seen as improving these two functions to address the issues we’ve observed and @levi aims to fix.

skilesare · November 19, 2024, 4:25pm

A co-location fee makes sense, although I think it would be nice if it could be abstracted to a group. I’d like to allocate compute to 100 over a group of canisters and let them get more priority…or something like that. More fine-grained control would be nice. Once we have the knobs we can write software to auto manage them.

marcio · November 19, 2024, 6:30pm

Yes, more flexible in general. Maybe compute and storage allocations should be dynamic, adjusting based on the load and more granular. A bit closer to Solana/Ethereum fees.

lastmjs · November 19, 2024, 8:39pm

I really love the overarching theme of this proposal, which is to abstract away scaling complexities.

Unfortunately subnets themselves have too many issues that have not been sufficiently abstracted away from the developer. I’m not sure this proposal should be implemented until those issues are fixed.

I believe that we should work towards developers being able to be completely agonistic to the subnets they are deploying to. Instead they should simply choose the properties they want their canister to have, the most obvious example being the replication factor.

I think at least the following problems would need to be resolved first:

Cross-subnet latency is high and different than same-subnet latency
Canister id’s shouldn’t change across subnets, a canister should just be a canister and have the same id wherever it’s deployed

lastmjs · November 19, 2024, 8:43pm

We’ve got to resolve these kinds of issues at the protocol level.

Developers must be allowed to deploy canisters and have them simply work, with consistent latency and automatic scaling.

The fact that subnets have been exposed as a concept is very unfortunate.

marcio · November 19, 2024, 9:38pm

I think you need to change the whole architecture for that and invent something new. I don’t know of a blockchain that can do that. Maybe the best solution is something like Solana, but the low-level stuff is transparent to smart-contract developers.

levi · November 20, 2024, 3:18am

Here the user can call the group/community directly without going through a LocalUserIndex while still showing authentication. Create a root authenticator canister that can sign messages using either canister-signatures (like internet-identity) or threshold ecdsa/ed25519, with a constant public-key seed. When a new user is created, the root canister signs a message that contains the user-canister-id and it’s authorization status, something like: “This canister is a legitimate OpenChat user canister.”. The message and the root-canister’s signature on that message is then saved in the user-canister as the user’s credentials. The root canister-id and its constant public-key are known to all canisters beforehand. When a user wants to join a group/community, the user calls the group/community canister, sending the user’s credentials in the request. The group/community canister checks the user’s credentials and makes sure it is signed by the root authenticator canister. This way you don’t have to go through the LocalUserIndex canisters, and the group/community canisters don’t have to keep track of every new LocalUserIndex canister.

For this one, instead of grouping queries by subnet, you can create chats-cache canisters. Each chats-cache canister can hold the latest rolling 5Mb of the chat-data of some 20,000 chat canisters. When a chat canister receives some chats, it can start a periodic timer for every 3 seconds. In the timer, if there are new messages, then push the latest messages to the chat’s chats-cache canister. In the timer, if there are no messages to push, cancel the timer. Using a timer here can save many push-calls for busy chats that have many chat-messages come in at the same time. When requesting chat updates, group the chats you want updates for by chats-cache canister, and query the chats-cache canisters. User’s that consistently log on will always have the same fast loading. If a user doesn’t log on for a while, and some of the chats it hasn’t seen yet have already been taken out of the chats-cache canisters, then that one time the user will query the chat canisters to catch up. This way works no matter what subnet any canister is on, and will give you control and stability of how many chats and which chats can be in a chats-cache canister.

Thank you for being specific here.

What do you think about these solutions?

skilesare · November 20, 2024, 3:38am

a little girl is asking why not both while standing in a kitchen .

There are a number of architectures today where a user can be agnostic. If you are deploying a dapp asset canister and it talks to one backend canister, you don’t care what subnet they are on.

If you are a dapp on an application subnet and you call the ICP canister, you are agnostic to the fact that it is on another subnet, because from all subnets it requires crossing a subnet boundary.

This works and works fine for many applications and configurations.

There are other configurations where you don’t want to be agnostic. You don’t want your archive canister being on a different subnet than your ledger because under high load you may fall behind and have your ledger grow and potentially consume many more cycles depending on your storage and look-up algo.

I understand why these are not the first concepts you want to introduce a new developer to, but it hardly seems rational to nerf the system for the advanced applications.

Moore’s law scales exponentially for memory and compute.
The speed of light does not scale.
Compression and transmission may scale at best logarithmically.

There will always be a massive advantage for canisters operating on the same subnet and sharing memory and compute space. The problem NEVER goes away. Just like programs taking up all your memory, there will always be a NEXTTECH that needs that advantage run and exist.

Of course we should keep driving the latency of cross-subnet interactions down, but it will never ever ever be on par with the shared space. Maybe subnets eventually give us so much memory and compute that we don’t care, but that is a different problem, and a good one to have.

(And as we’ve discussed in the past there may be some cool zk solutions here that achieve some magic, but I doubt they will solve all the issues)

yvonneanne · November 21, 2024, 5:13pm

After reading this thread and seeing a majority of the communit voted againsty (13.66% NO vs 1.44% YES) DFINITY will reject the proposal. The main reason is the fact that slowing down communication for canisters on the same subnet artificially, will harm many applications.
However, DFINITY is supportive of the underlying intent to improve the developer experience. To this end, DFINITY will keep working on alleviating performance bottlenecks and support canister migration, among other improvements.

Topic		Replies	Views
Path forward for subnet splitting and protocol scaling Developers	19	370	October 17, 2024
Suggested measures to reduce latency and improve ICP scalability Developers	48	1280	November 4, 2024
Long Term R&D: Subnet splitting (proposal) Roadmap	10	2564	April 6, 2023
Subnets with heavy compute load: what can you do now & next steps Developers	174	4116	November 26, 2024
LAMENT: A tale of constant struggle of what it's like trying to scale on ICP Developers	73	3551	November 3, 2024

Voting is now live for a new proposal for the scaling of the world-computer

Related topics