Recommended usage of CertifiedData

I read about CertifiedData in the The Internet Computer Interface Specification :: Internet Computer

The spec doesn’t talk about actual usage of this piece of info. I don’t see much in the doc either. Is there a recommended usage case / reference implementation somewhere?

My takeaway is that CertifiedData is meant to be the source of verifiability at canister level. It could be a merkle root I guess? Or maybe the hash of latest “block” of a canister owned blockchain?

Hope some light shed on the topic. I’d like to add that, I’m most interested in the recommended implementation of such mechanism in the most cycle-efficient way. Currently I don’t see any material about estimating cycle consumption.

1 Like

Yes, documentation and best practice is a bit scarce on this. We’ll have more eventually (also talks etc).

You can see some existing applications of certified data:

My takeaway is that CertifiedData is meant to be the source of verifiability at canister level. It could be a merkle root I guess? Or maybe the hash of latest “block” of a canister owned blockchain?

Yes! The canister can choose, and as you see in the examples above, both are valid approaches.

I’d like to add that, I’m most interested in the recommended implementation of such mechanism in the most cycle-efficient way. Currently I don’t see any material about estimating cycle consumption.

I don’t know if I know the most cycle-efficient way; at this point I am already happy when it works :slight_smile:

Unless you have very special needs, I’d recommend to use the same hash tree representation as the system itself (see Interface Space). This is also used by all the above applications, except the ledger, and you get to re-use some libraries, e.g. agent-rs/hash_tree.rs at next · dfinity/agent-rs · GitHub and internet-identity/src/certified_map at main · dfinity/internet-identity · GitHub (which I personally hope will eventually be available separately, with a more liberal license, and on crates.io).

Using the rbtree, or a similar datastructure (e.g. a patricia trie as in the case of the Motoko library) will only recompute few hashes as you change the data structure, and give you decent cycle consumption. There can always be more optimizations, of course.

5 Likes

Can anyone speak to the motivation for providing a certified response?

I understand that verifiability is good, but I’m trying to understand its relevance on a tamperproof platform. Is this a “trust but verify” mechanism?

I’m interested in understanding more about why it’s optional and not always necessary or required, or at least the default via some provided boilerplate code.

My guess is that:

  • the Chain Key is tamperproof;
  • The historical states are not stored (for scaling reasons) so we can’t verify all state changes from “genesis” natively.
  • Outbound nodes are hosted by independent datacenters (afaik) – to maintain privacy on the data (compared to ethereum & co that allows you to run your own node – and verify)
  • So the connection “Chain Key Nodes” <=> Outbounds nodes <=> Users is protected by (Mutual) TLS – and this key is (likely emitted by a central authority ie.) not connected to the Chain Key (which is decentralized and tamperproof); so the chain of trust is broken here; that’s why you’d need to verify the chain key to verify you retrieve trusted data.

I think in the ethereum world, if you use some remote JSON-RPC servers (eg. Infura or co), the providers could start acting maliciously and give you some tampered on-chain data.

As opposed to an uncertified one?

Query calls are executed by a single replica. Our trust model for the IC is that some replicas may be compromised, but not many. So a query call may hit a malicious replica, and could fake any response to you. Certification protects against that. If you trust all replicas, or have other ways to verify the data, you don’t need to bother with certification.

It’s optional because there is no “uniform” way to provide certification. For every application, and eveyr set of possible queries, you have to hand-craft your certification logic on both ends.

Once could imagine a way to certify query calls on a lower level of the stack (e.g. ask more replicas and aggregate signatures), but it would come at a performance cost. I expect that we’ll get that feature eventually (because the current certification scheme is just too inaccessible and also fundamentally can’t handle all use cases), but it’s non-trivial tech and there is always so many different things to do.

I have written up some documents on the limitations on certified variables and on certified queries, but I am not sure if I am allowed to share those.

3 Likes

I think this is the key thing I was missing. I was thinking that if you couldn’t trust a replica to begin with you wouldn’t trust it telling you that a response was certified.

Ah, right! A certificate is equivalent to a signature of the whole subnet. Not the single node certifies, but the subnet certifies, and the node just passes the certificate through.

1 Like