How can I verify that two asset canisters serve the same content (i.e. same assets)? Is there a root hash that I can compare? If so, how?
There is a root hash. It’s harder to access than it probably should be. But it also doesn’t say as much as you’d expect.
Right now the asset canister doesn’t have a get_root_hash function. But you can extract the root hash from every response to http_request. If you e.g. request /index.html through a call to http_request, you receive back a response with an IC-Certificate header. In that header there is a tree section. From the spec: tree: Base64 encoded string of self-describing, CBOR-encoded bytes that decode into a valid hash tree as per certificate encoding. This will contain the root hash.
If two root hashes match then the two asset canisters serve the same content. If they don’t match it could mean any of the following:
- At least one asset doesn’t match
- At least one asset’s headers don’t match
- At least one asset’s available encodings don’t match
- The certification tree implementation is differently structured (AFAIK the asset canister’s tree should be fine)
- The canisters don’t handle aliases the same way
- The canisters don’t serve the same certification versions
If you don’t want to go the root hash route, then your best bet is listing all assets. The asset canister has a list function that shows what assets are available. If you then trust the canister to not hide any assets you can compare the assets one by one
An example of using this header and checking the hash tree can be found in the @dfinity/assets package: agent-js/packages/assets/src/index.ts at main · dfinity/agent-js · GitHub
Just double-checking if there have been updates since.
I see there is a query function certified_tree but I don’t know if it is new or if it has always been there. It returns a “certificate” and a “tree”. I suppose I can also get the root hash from either one of those, or?
git blame says it’s been there for 2 years
No idea why I didn’t see it last time I looked.
You’re right, the tree field is the complete hash tree with all certified hashes included, and the data that’s certified in the certificate is the root hash of the same tree.
certified_tree will run into issues if there are too many assets in the canister because there is a max response size and this function does not care about that. But for most cases this should work
Does the certificate contain the root hash?
Is the certificate exactly the same as the one in the IC-Certificate header? I mean does it have the same encoding or is it a different encoding?
Where can I read about the structure and encoding of it?
Yes. It is as described in this section. I think it would be the same as IC-Certificate
I’m trying to verify assets from an assets canister using the response from the certified_tree endpoint, and I’m hitting a problem when comparing root hashes.
-
Here’s my setup:
-
I have an assets canister that has been updated with new assets several times.
-
I created a brand new assets canister.
-
I installed the same set of assets on both canisters.
-
When I call the canisters to get the “evidence” string (asset certification), I get the same evidence string for both.
-
But when I check the
certified_treeresponse, the root hashes differ.
What I see in the
certified_treeresponse:-
The
treefield contains two main branches:"http_assets"and"http_expr". -
The
"http_assets"branch visually contains the same data and values in both canisters, but the internal tree balancing is slightly different. -
The
"http_expr"branch in the first canister still contains entries for old assets that are no longer present in the new one.
The
certificate["tree"]field (which I initially thought was the proof tree) contains many pruned entries. Some entries have identical values between canisters, but I still don’t understand how to meaningfully compare them.Our intention:
We want to give users a way to prove that the assets installed on our assets canister have not been modified after deployment. -
Are you referring to the compute_evidence function? That function hashes a set of changes to the asset canister. It doesn’t prove that the same contents are/will be present, only that the diff that will be applied is identical
Yes, unfortunately the cert tree depends on the insertion order. If the two canisters don’t have the same history then you won’t get the same root hash. I’d like to get that improved at some point, but I don’t see it happening anytime soon…
http_assets is used for certification v1, and http_expr is used for certification v2. They should contain roughly the same things, but in different structures.
Not good. If you can reproduce, please tell me how. That would be a pretty bad bug.
The only purpose of that field is to prove that this specific response is part of the certified responses. Any info beyond that is pruned, so I don’t think it’s very useful to compare anything
In general, the easiest way to do this is to go the SNS route - only the SNS can adopt changes because nobody else has Commit rights. Then all changes are logged via governance. You can also put a different canister in control of that if you don’t have an SNS.
Another option is to use list. It shows all assets the canister currently has, along with the asset hashes. That list should be relatively easy to compare. The asset hashes can in theory be spoofed by the uploader, but then the assets will fail asset certification and the boundary nodes will block the requests
I have run into similar problems like you in the past. The root hash might not be so easily reproducible. For example, it might depend on the order in which the assets are uploaded in the canister. If you have a lot of assets then dfx might do that with multiple update calls in parallel an you can’t predict the order in which they will land in blocks.
For asset verification I ended up using a different approach.The list query function returns a list of all assets, their filenames and individual hashes. Then you can compare that list to your locally produced assets. You don’t have to upload them to a brand new canister.
Any approach, root hash or list, of course relies on the build process (e.g. npm run build) to be deterministic. Not sure how reliable that is in practice.
The list command also returns individual modification timestamps for all files. If all you want to prove is that assets have not been modified then you don’t even need to reproduce the hashes. This timestamps should be enough. It doesn’t tell you though whether assets have been deleted.
It would be very helpful if the asset canisters had a function that returned a global timestamp of the last modification of any of its assets. That would be very useful. Does the certified_tree contain a timestamp (@Vivienne )?
Yes, the cert that gets returned along with the tree contains the timestamp. From the spec:
The certificate is a blob as described in Certification that contains the values at path
/canister/<canister_id>/certified_dataand at path/timeof The system state tree.
If you want to look at it you need to decode hex to CBOR
I guess at path /time I get the time of the whole state tree which will continuously increase. What we would need is a timestamp of when the state of an individual canister last changed.
Oh right, my bad. No, certified_tree does not return any last-changed-at timestamp or anything like that. /time in the certificate refers to the last round when the subnet reached consensus on anything (not necessarily affecting this canister).