As work is done on taking, downloading, and cloning snapshots, to what extent will we get and/or have information recorded about these snapshots? My understanding is that they are an amalgamation of wasm, memory, and perhaps some configuration. From what I can see right now the only real information on the snapshot that we get is an ID from the replica(it appears to just be incremental), date, and size.
This hardly seems deterministic, although I guess we could assume some integrity from the replica. Would it be possible to get the hash of the non-configuration parts of the snapshot?
I’m guessing the configuration is basically the freezing threshold, compute(and other config vars) plus the canister id.
The use cases I’m considering here are:
If in the future we are able to download them and an open service wanted to make their snapshots available, people could verify they were valid snapshots and hadn’t been altered(and thus were valid to use for testing or some other purposes).
As a kind of checksum for restoring snapshots so that we could later verify that the proper snapshot was used.
Another question: From a canister’s point of view, is there any lifecycle event that is triggered when a snapshot is restored? In the event that one is migrated and the subnet changes, if the canister needs to know its subnet it would be nice have a chance to capture this change and/or to some how record/note the subnet change.
I understand the use cases, but I’m not sure if a hash of the snapshot does what you want. It seems like what you really want is a certification that a given snapshot was actually taken from a canister (maybe also with a timestamp?).
Yes. I think that would work. A subnet signature would be verifiable I think.
If we get to the place where we can upload snapshots I think this would be required, otherwise someone could modify the memory manually and the restore it. While there might be use cases where you’d want to do this, I’d imagine it would break integrity, particularly in cases where there is the expectation of public transparency, verifiability, and deterministic reproduction of all state changes.
Snapshots currently don’t have hashes associated with them mainly because computing them is non-trivial in the context of executing a block per second. A canister can be 500GB of data, and hashing all that is quite different from taking a snapshot without a hash, which is quick for all canisters.
These problems are solvable of course, but the current thinking is to leave it as a future improvement.
At the moment, there isn’t any canister method that is executed upon loading a snapshot, but the team has already planned to implement support for such a canister method. I can’t promise any ETA though at the moment.
That is actually a decent safety feature that we might want to have. Perhaps the snapshot should not be recoverable if the star or memory has changed? Or at least maybe you have to do something extra? Obviously one nice thing to have would be repairing a botched migration, so you likely don’t want to make it impossible.
I think the canister reset option can be extremely valuable, but we need to understand a) the limits of its utility to function without creating serious issues regarding duplicate or destroyed values, and b) whether it is sometimes desirable to reset in spite of those consequences.