Long Term R&D: Canister Migration (proposal)

1. Summary

This is a motion proposal for the long-term R&D of the DFINITY Foundation, as part of the follow-up to this post: Motion Proposals on Long Term R&D Plans (Please read this post for context).

This project’s objective

This project is about Canisters moving from one subnet to another subnet to take load off a subnet by moving popular or offending canisters to separate subnets, but in a way which is transparent to users, developers, or other canisters sending messages.

2. Discussion lead

David Derler

3. How this R&D proposal is different from previous types

Previous motion proposals have revolved around specific features and tended to have clear, finite goals that are delivered and completed. They tended to be measured in days, weeks, or months.

These motion proposals are different and are defining the long-term plan that the foundation will use, e.g., for hiring and organizational build-out. They have the following traits and patterns:

  1. Their scope is years, not weeks or months as in previous NNS motions
  2. They have a broad direction but are active areas of R&D so they do not have an obvious line of execution.
  3. They involve deep research in cryptography, networking, distributed systems, language, virtual machines, operating systems.
  4. They are meant to match the strengths of where the DFINITY foundation’s expertise is best suited.
  5. Work on these proposals will not start immediately.
  6. There will be many follow-up discussions and proposals on each topic when work is underway and smaller milestones and tasks get defined.

An example may be the R&D for “Scalability” where there will be a team investigating and improving the scalability of the IC at various stages. Different bottlenecks will surface and different goals will be met.

3. How this R&D proposal is similar to what we have seen

We want to double down on the behaviors we think have worked well. These include:

  1. Publicly identifying owners of subject areas to engage and discuss their thinking with the community
  2. Providing periodic updates to the community as things evolve, milestones reached, proposals are needed, etc…
  3. Presenting more and more R&D thinking early and openly.

This has worked well for the last 6 months so we want to repeat this pattern.

4. Next Steps

Developer forum intro posted
1-pager from the discussion lead posted
NNS Motion proposal submitted

5. What we are asking the community

  • Ask questions
  • Read 1-pager
  • Give feedback
  • Vote on the motion proposal

Frankly, we do not expect many nitty-gritty details because these are meant to address projects that go on for long time horizons.

The DFINITY foundation’s only goal is to improve the adoption of the IC so we want to sanity-check the projects we see necessary for growing the IC by having you (the ICP community) tell us what you all think of these active R&D threads we have.

6. What this means for the existing Roadmap or Projects

In terms of the current roadmap and proposals executed, those are still being worked on and have priority.

An intellectually honest way to look at this long-term R&D project is to see them as the upstream or “primordial soup” from which more baked projects emerge from. With this lens, these proposals are akin to asking, “what kind of specialties or strengths do we want to make sure DFINITY foundation has built up?”

Most (if not all) projects that the DFINITY foundation has executed or is executing are borne from long-running R&D threads. Even when community feedback tells the foundation, “we need X” or “Y does not work”, it is typically the team with the most relevant R&D area that picks up the short-term feature or project.

1 Like

Please note:

Some folks gave asked if they should vote to “reject” any of the Long Term R&D projects as a way to signal prioritization. The answer is simple: “No, please, ACCEPT” :wink:

These long-term R&D projects are the DFINITY’s foundation’s thesis at R&D threads it should have across years (3 years is the number we sometimes use internally). We are asking the community to ACCEPT (pending 1-pager and more community feedback of course). Prioritization can come at a separate step.

Hello everyone! I am David. I am the team lead of the message routing team, and, as Diego mentioned, I will be available for discussions related to the proposal mentioned above. I will follow up with the 1-pager providing more details on the proposal in the next days.

7 Likes

Canister Migration (1-pager)

The Internet Computer (IC) blockchain is composed of multiple subnet blockchains that can host dapps consisting of canister smart contracts. With the help of chain key cryptography those canisters can transparently communicate with each other by sending messages back and forth irrespective of the subnet they are hosted by. The IC can scale out its computational capacity by adding more subnets. However, currently there is no easy way to move a canister smart contract from one subnet to another for load balancing and co-location purposes.

Objective

This motion proposal is about efforts related to enabling the migration of canister smart contracts between different subnet blockchains of the IC. The objective of canister migration is to provide a tool for load balancing on the IC by allowing developers to move canisters or groups of canisters from one subnet to another. For example, one could envision moving a popular canister away from an otherwise heavily loaded subnet so that the popular dapp can meet its service level objectives again.

Why is this important

Currently, the only way to move a canister smart contract from one subnet to another is for the canister developer to implement an API that allows downloading the canister’s state, download the state via this API, create a new canister, upload the state to the new canister, and delete the old canister. Obviously this has quite a few downsides, including but not limited to:

  1. Every user of a dapp that is migrated this way would need to trust the entity who downloads and uploads the state to not modify the state before uploading it again.
  2. Currently canisters address other canisters using the canister ID. A newly created canister will have a new ID, meaning that currently all canisters that want to talk to the migrated canister will have to be updated to talk to the right canister.
  3. One needs to take care that the state of the canister to be migrated is not modified after its state has been downloaded and/or to implement measures to keep the “old” and the “new” canister in sync.
  4. There is no easy way for users who were using the old canister to find the new canister after a canister has been migrated this way.
  5. When users interact with canisters using Internet identity (II) they do so via a principal identifier that is, among others, specific to the canister they interact with. If the canister ID changes upon migration, a canister that uses II will not be able to link interactions of the same user with the “old” canister and the “new” canister anymore.

Obviously this is quite a complicated process that requires a lot of effort and where things may easily go wrong. The high-level goal of this project is to make things less cumbersome for dapp developers while at the same time also reducing the required trust assumptions, and in doing so making it easier to scale dapps.

With a growing number of subnets and canisters, a higher degree of automation is required as well, as load balancing tasks can no longer be handled manually.

Key milestones

Four main milestones are currently envisioned in this project. The first milestone is mostly about improving existing tooling and is less research heavy, while the remaining milestones still require quite some research regarding how a feature like this could be realized at all, and how things can be done so that they are operationally convenient for dapp developers and users. The milestones are as follows:

  1. Extend the tooling to make it easier for dapp developers to download the state of their canister and upload it to a different subnet. Note that the first milestone is just about making what is already possible more convenient for developers.
  2. Enable migration of canisters via a proposal in a way transparent to dapp developers in the sense that the other canisters that sent messages to the migrating canister before migration will not have to be adapted to send to the new canister. Do note, however, that there will likely be some downtime of the canister during migration and there will be potentially increased latency when talking to the migrated canister via canister-to-canister calls due to the canister migrating to another subnet.
  3. Enable migration of canister groups so that dapps that are composed of multiple canisters can migrate as a whole.
  4. Further automate canister migration so that canisters (or groups of canisters) can be automatically migrated based on various parameters of the system like the load on a subnet, for example.

Input regarding prioritization or organization of the milestones from the community is highly appreciated.

Discussion leads

The motion proposal is driven by @derlerd-dfinity1. @Manu, @johan , @yvonneanne, @akhilesh.singhania, @free, and other team members will also be available for discussion.

Why the DFINITY Foundation should make this a long-running R&D project

This project is important because currently dapp developers may hit scalability issues because of their colocation with other dapps on the same subnet. Without working on this motion proposal dapps may not be able to meet their service level objectives anymore as soon as subnets start to fill up, making it impossible for an application to grow without becoming spread across multiple subnets and incurring extra latency.

Skills and Expertise necessary to accomplish this

Realizing canister migration will require to develop new schemes and protocols that will enable authentically transferring canisters from one subnet to another and to adapt the existing schemes and protocols to guarantee that the migration process can be done in a way transparent to canisters and users. Clearly this is a broad R&D effort with many open questions requiring the involvement of many teams all across the IC stack that will take years to complete. In addition it also requires broad input from the community to guarantee that the resulting solution meets the needs of both canister developers and users.

Additional open research questions

While the goals outlined in the previous sections already directly pose many of the research questions that need to be answered, there are also some questions that are less explicit. In this section additional research questions are listed:

  • Currently canisters are addressed by their ID and routing happens via the so-called routing table. The routing table maps from ranges of canister IDs to the hosting subnets. Migrating individual canisters will lead to fragmentation of the routing table. It needs to be answered as to how to avoid fragmentation of the routing table while allowing canisters to migrate. Initial directions to answer this question include the introduction of a DNS-like system that would enable a migrated canister to take a new canister ID while its “alias” remains constant and can be used to look up the current ID of the canister.
  • How can migration be done in a way compatible with II, i.e., so that each user interacting with a canister via II before migration will be identified as the same user after migration. As hinted at before, not changing the canister ID upon migration is one way to achieve this. However, not changing the canister ID is in conflict with the previous research question. So the goal of this question is to answer how to get the best of both worlds, i.e., potentially allowing to change the canister ID while also not breaking II.
  • What is the downtime of a canister that is migrated and how can the downtime upon migration be minimized?
  • Does migration require putting canisters into the stopped state before migration or can this be avoided?
  • What are suitable canister placement and re-allocation algorithms?

Examples where community can integrate into project

As already mentioned before, in the initial phase of this motion input regarding refining the scope and priorities of this project from the community is highly appreciated. In addition many technical discussions with the community are anticipated as the motion and research and development of potential technical solutions to address the goals of this proposal move forward.

What we are asking the community

  • Review comments, ask questions, give feedback
  • Vote accept or reject on NNS Motion
  • Participate in technical discussions as the motion moves forward
5 Likes

Technically not true. The identifier is tied to the hostname, which may or may not contain the canister id. Canister IDs in hostnames are clearly not viable mid-term, and I assume we will have nice host names soon. (I still wonder why we don’t have it already now, with CNAMES, but that’s off topic here). At that point, migrating a frontend from one canister to another won’t affect the user anymore.

Good point. What is meant here is that it is currently tied to the canister ID because the canister IDs are at the moment part of the hostname.

1 Like

Proposal is live!

https://dashboard.internetcomputer.org/proposal/35674

1 Like

Any movement on this?

We are working on subnet splitting as a first step. This will allow us to split overloaded subnets by creating a new subnet to take on half (or part of) the canisters of the original subnet.

The state splitting logic and verification have been merged to master over the past few weeks. We are now working on the orchestration / runbook and expect to have a working process within a month or two.

Most of the logic implemented for subnet splitting should be reusable for more general canister migration (where the canisters are not migrated off to a brand new subnet, but to an existing one). But I’m guessing that the next step in this direction after basic subnet splitting is implemented will be to speed up the process. The implemented process requires manual work, e.g. downloading and uploading states; doing a state sync from a single replica, etc. It’s all deterministic and verifiable, but will likely be quite slow if it involves splitting a subnet with a 500 GB state.

5 Likes