Efficient ways to clone a canister’s data

I’m looking into solutions that would allow me to build an auto-scaling solution for canisters based on the current memory utilization of the canister, and part of this solution involves being able to fully clone or partially replicate the data stored in the canister and then to repartition the data amongst the original canister and clone(s).

The only (and most expensive option) I can think of right now is to use inter-canister update calls, but when we’re talking about 1-2 GB of data this would be a somewhat slow process based on what I’ve seen from others. I could also maintain multiple canisters from the start, but this is then 2X the cost.

Is there any way to pull out a canister replica for this purpose via a clone() type of functionality? If not, are there any other current solutions or ongoing work/roadmap items that might help IC Devs clone or repartition the data in canisters?

1 Like

Also see discussion here: Canister backup - #19 by skilesare

This would be great. I know that there is work underway for downloading the current wasm state, but it would also be great if the replica could just copy it and start running it under a different canister id.

1 Like

Thanks for linking that discussion.

This clone/copy and delete methodology is exactly my thinking. Fully copying a canister through inter-canister calls is both expensive and would take a significant amount of time, not to mention you now introduce a whole bunch of distributed systems problems with what to do if a 1-2GB canister data copy fails in the middle (since you would be splitting the copy process into multiple update calls supposedly).

While this works great for backups, you have to start out by spinning up that extra canister before. This doesn’t work if you’re auto scaling out in a reactionary fashion, and to top it off doesn’t effectively tackle the problem of if you have data that is unevenly distributed (based on a particular primary key).

The intermediate solution that a few teams have come up with is to find a specific entity that they do not believe will push the 4GB canister limit anytime soon (say a user, or a message chat on OpenChat) and just spin up a new canister for each of those entities. I think that the above will most likely be my approach (in the intermediate term), as I’m certain that a deeper IC based solution (either the replica copy solution or another solution) is the way to go in the long term.

Building off of this, I’m assuming these clones would then be bounded by the individual subnet of a particular application - which would then be the next bottleneck.

@diegop From a brief glance I didn’t see this on the 2022 Sneak Preview RoadMap. Are there any Roadmap items that might tackle this problem or engineers that might be able to comment on the feasibility/work required to perform efficient canister data cloning?

+1, this would be very useful.

Even a feature that lets canister owners download/upload a canister’s state would essentially allow us to clone a canister. Last I checked it was on the roadmap, but I don’t know the current status.

@akhilesh.singhania, do you happen to know? Thanks!