Canister backup and restore [Community Consideration]

Summary

We would like to announce that we started looking into the canister backup/restore problem. We are currently in the design phase and are looking for your input on:

  • Importance of the problem for you and your use cases. This would help us to prioritise this relative to other problems.
  • Your thoughts on the potential snapshot-based solution presented here.
  • Your suggestions of alternative solutions.

Background & Problem statement

Currently, there is no easy way on the Internet Computer to backup data in the event of corruption or data loss, and to restore it by reverting to a previous state. Developers must manually implement a way to serialise the state of the canister, download it off-chain, and then manually upload it if they need to restore the data. This approach is error-prone, not scalable, expensive, and cannot be easily done in a reasonable amount of time.

To address this issue, the IC should provide a way for canisters to take snapshots of their state and restore them when necessary. Additionally, it would be ideal if the data could be exported and imported to and from a local environment, but this would require further engineering effort and can be implemented in future iterations.

By solving this problem at the protocol level, controllers can fix a broken canister by rolling back to a previously saved snapshot in the event of a bug. This is a common problem encountered, especially when upgrading a canister.

Potential solution

At this stage, our current idea is to concentrate solely on developing a prototype that offers endpoints for capturing on-chain snapshots of canister states and loading the snapshots to canisters. A snapshot will naturally consist of both the stable memory and the heap.

Of course, the initial iteration will have a few constraints, which will help to make the problem as straightforward as possible while providing an improved developer experience.

  • Only controllers have the authority to take a snapshot and restore it.
  • While it is not explicitly enforced, creating a snapshot only after stopping the canister is recommended. This follows the same principle as with upgrading a canister, as making sense of the callbacks may not be possible.
  • During the first iteration, only one snapshot per canister will be allowed, and taking a new snapshot will replace the old one. In later iterations, we may expand this feature to enable multiple snapshots per canister.

Below, you can find an API sketch for interacting with the IC when there is a need to take a snapshot or recover the state from a snapshot identified by snapshot_id.

type timestamp = nat;

type bytes = nat;

type snapshot = record {

    id: snapshot_id;
    
    taken_on: timestamp;
    
    label: opt text;
    
    total_size: bytes;
    
    // Checksum for correctness verification.
    
    checksum: blob;

};

service: {

    // Takes a snapshot of the given canister's state.
    
    take_snapshot: (canister_id, label: opt text) -> (snapshot_id);
    
    // Loads the snapshot to the canister identified by `canister_id`.
    
    load_snapshot: (snapshot_id, canister_id) -> ();
    
    // List the snapshots of the canister.
    
    list_snapshots: (canister_id) -> (vec snapshot);
    
    // Deletes the snapshot with the given ID.
    
    delete_snapshot: (snapshot_id) -> ();

}

Future outlook

In the future, it should be possible to incorporate support for more features on top of the existing proposal. Potential additional features that could be implemented include:

  • Endpoint for downloading a snapshot of a canister to the local environment.
  • Endpoint for uploading a snapshot from a local environment to the IC.
  • Ability to create new canisters based on snapshots.

These new features are useful in scenarios such as downloading snapshots to a local environment for debugging or backup purposes, restoring canister states from an off-chain backup or creating new canisters with the data from previously taken snapshots.

However, implementing these features would require considerable engineering effort, including developing tools for manipulating local snapshots, debugging and inspecting data from a snapshot.

What we are asking the community

Please let us know if the problem of backup/restore is important for you. It would be great if you could share your use case and requirements. We would also like to know if the snapshot-based solution would work for you.

We welcome any alternative proposals that you may have. Thank you for taking the time to share your thoughts with us.

36 Likes

Adding people involved or interested in the discussions: @ielashi, @bogdanwarinschi, @dsarlis, @Manu, @ulan, @roman-kashitsyn.

2 Likes

Adding link to a Motoko library with on-chain backup/restore functionality GitHub - ZenVoich/backup: On-chain backup system for Motoko

4 Likes

@hpeebles, @saikatdas0790, @skilesare you expressed interest before in this topic. Do you have any feedback? We would appreciate to hear what you think.

2 Likes

This looks great. We would definitely be using this.

Would the snapshot be limited to the same subnet or are there plans to also support backups to a different subnet than the source?

1 Like

@Alexandra,

First thanks to you and the DFINITY Leadership for finally doing something to alleviate this huge problem, one that will affect any successful DAPP.

I would ask you to add two things to your prototype:

  • An easy template or API call to put a canister into maintenance mode, with a corresponding Web Page that alerts people that this is about to happen, and when it will be over.
  • An API or web based scheduler for backup and restore operations, so that the backup and corresponding maintenance can be done during hours where it would affect the least users of the DAPP.

Thanks, and please do consider me for beta testing this feature.

Joseph

1 Like

Thanks for this Motoko based backup solution, it helps for sure!

Being able to pull down our wasm+heap would be a huge quality of life improvement. My current process for things like our governance canister is to query out all the records so that I have them incase the unimaginable happens. If I actually lose everything we’d have a huge hydration issue on our hands and would likely be down for a while. This would save us time and give us peace of mind.

3 Likes

Hello everyone,

Firstly, I’d like to express my gratitude to the team and the community for continually pushing the boundaries to make development on the Internet Computer easier and more secure every day. Your work is highly appreciated.

I am the mind behind b3wallet, a decentralized wallet with a unique focus on multi-signer and multi-chain functionalities. Our platform aims to provide a user-friendly, secure, and trustless environment for digital assets across various blockchains. If you’re interested in experiencing these features, feel free to test the wallet at b3wallet.live.

Why Backup is Crucial in Decentralized Wallets

In a fully decentralized ecosystem like b3wallet, trust is not bestowed but earned through cryptographic proof and consensus. One of the most sensitive aspects is the management of wallet controllers and signers, particularly in shared or multi-user wallets. Having a reliable backup and restoration mechanism is not just a nice-to-have but a crucial component that can significantly enhance user trust and system reliability.

My Initial Idea

Initially, my approach involved the system canister saving essential wallet configuration details like signers and controllers. Upon saving this data, the system canister would then create the wallet canister and set the signers and controller at first. If a wallet has a single signer, the controller would be the user themselves, and they can take care of the restoration process if a disaster happens. However, if multiple signers are provided, the controller would be the system canister. This design allows for a streamlined restoration process wherein the system canister can authenticate and verify the signers before restoring the wallet to its original state, where multiple signatures are needed to confirm a transaction, just like before!

After Seeing This Feature Proposal

After reading about your proposed snapshot-based backup and restore functionality, I am thrilled. This feature could simplify the backup and restoration process dramatically for projects like mine and many other developers facing similar challenges. It would allow us to automatically take a snapshot of critical wallet information at the time of creation and use that for any future restoration needs.

Your proposed API sketch aligns well with what I’d need. The feature to list snapshots would be particularly useful for administrative purposes. The constraints you’ve laid out also make sense for an initial implementation.

Closing Thoughts

I look forward to seeing this feature go from proposal to implementation. It has the potential to solve a common pain point in decentralized application development on the IC. I also welcome any community thoughts on how to expand or refine this feature further.

Thank you for taking the time to consider my input.

Best regards,
Behrad

2 Likes

I just want to be able to add a property to an existing type and be able to set a default parameter when I upgrade and not lose all my data.

Currently I try to avoid making data structure changes and ulimately efficiencies are lost.

Snapshot would be great for when things go wrong.

4 Likes

This can be a great help and it is a critical work item on our product roadmap, given that data integrity holds immense significance for platforms like RuBaRu.

Our use case involves taking regular, automatic data snapshots and archiving them for future use in case of data recovery needs. This is a part of Disaster Management (DM), ensuring that we have a backup strategy in place.

Will evaluate how much we can utilize out of the box.

2 Likes

I would like to see this as a system call, maybe on the management canister, which can only canister controller call.

Under the hood the backup function will:

  1. Call preupgrade on a canister(do not save to actual canister)
  2. Return blob snapshot of entire stable memory(which probably populated with extra stuff from step 1)

Restore function:

  1. Controller uploads snapshot blob data to management canister
  2. Management canister loads this data to stable memory
  3. Management canister calls postpgrade on a canister

Everything related to serialization/deserealization will already be implemented in the postupgrade/preupgrade functions, so this will require minimal effort from developers.

Back up in interval, off-chain backup, 3rd party services, etc… can be implemented on top of these functions.

2 Likes

Thank you for the feedback! At the current stage, we limit it to the same subnet. But as I mentioned in the post, there is a lot of work that can be added on top of this first iteration, so there is a lot of potential.

Thank you for the feedback! The potential solution that I presented to the community gives control to the developers of when the backup and restore happens. So you will be able to decide when you need to do a backup or when you want to restore the previously saved data. We do recommend to stop a canister before starting this process.

Thanks, and please do consider me for beta testing this feature.

I appreciate your help, we will inform you.

2 Likes

Can we get rid of this footgun of developers having to manually stop/start canisters during upgrades/backups?

If it’s inadvisable, why have it at all?

2 Likes

@saikatdas0790 We recommend stopping because that’s the safest, but if you have a canister designed to have zero downtime, you can take a backup without stopping first, similar to upgrading. So it’s up to the developer to decide based on the canister or use case they have.

This problem is orthogonal to the topic presented here. For example, to allow the different use cases and make it safer for the average case, we could push this restriction at the tooling level (not at the protocol level), and one has to explicitly specify that the canister should not stop. However, this will not be handled by this feature.

1 Like

@ZenVoich The API from this post will be added to the management canister interface and is only available to the controllers. So it covers the requirements that you are asking for.

However, the second part of your message describes one way of downloading and uploading data that could be integrated with the canister but will not be included in the first iteration. It’s part of the future outlook.

Thank you, everyone, for taking the time to share your feedback with us! We are glad to hear that adding this feature improves the developer experience. The team will keep you updated!

2 Likes