RFC: Standardising how smart contracts expose state

We don’t have a super easy way to make a replicated query call, but you can remove the query from the IDL of any call and have it go through consensus. I will add designing a nice interface for that to the agent-js backlog for myself.

3 Likes

A big advantage of the HTTP endpoint is that you get certified result for free. With the query method, how do you verify the response? Certified data is only 32 bytes, and there is no generic way to verify the data.

There is also a consistency problem: in replicated state, the method call is processed in the middle of the block, where the new state tree is not generated yet. Returning the latest state tree makes the data out of date as well.

1 Like

Overall in favor, although so far I haven’t seen a solution that is obviously desirable for all callers.

If you use this method from a canister, it will be executed in the “middle” of a block. Do you even have a hash tree of the state then to return a pruned response?

Actually, if you return no certificate, there is no need for the hash tree to have a certain root node. Therefore no need to have any pruned tree nodes. In other words, you can just create a fresh tree consisting of just the requested data.

But at this point the question is: why even return an unwieldy data structure like a hash tree when there is no certification around? Why not something more easier to digest, like Candid?

And continuing this train of thought leads again to roughly where we we are now.

Hmm, the more I think about this, the more I believe we need to solve this at a more fundamental and general level. It should be possible to describe the canister state in a high level way (Candid), and then get all that we need. And next time we add something to the state, everything about data formats and certification follows. This scheme and associated tooling would then also help canister developers who are then facing the same issue.

What would that entail:

  • (Possibly) pick a subset of Candid types, call them certifiable.
  • Define a (simple) query language for them, similar to read_state, to select fragments of such a value.
  • Define a mapping from these abstract Candid values to our hash tree. This needs to be compatible with the above query language (i.e. a certificate reveals only the query result and that it is indeed the right result for this query, including the negative case).
  • In replicated calls, simply return Candid as now.
  • In non-replicated calls, return the hash tree representation of the same candid value, plus certificate.

If we can pull this off the whole replicated vs. certified distinction disappears on the application level. But it’s not trivial, unfortunately.

But if we don’t do this, and suddenly important functionality is now no longer reachable via our common high level interop system, we are again weakening the coherence vision of the IC…


Minor wording nits:

Not quite true, some parts of the state tree are only accessible via suitable authentication (in particular, ingress call status).

Do you mean it’s parameters, i.e. “state tree paths”? Or rather the internal “input”, e.g. which data structure it reads from?

4 Likes

Certainly in favor of the certified Candid types. But why picking a subset? Would something like https://dl.acm.org/doi/abs/10.1145/2578855.2535851 work for all Candid types?

Just saying “possibly” to indicate that I wouldn’t consider it a blocker if we can’t support everything.

Based on the paper abstract certainly sounds interesting. Not open access though and it’s too late today for my brain to process papers anyways. Did we discuss this paper before, when designing certified variables and/or candid?

Certified variables on the Candid layer? · Issue #1814 · dfinity/motoko · GitHub is also relevant, and may provide another attack vector for this problem: define the various data accesses as plain Candid-returning query methods, and (somehow) have a generic mechanism to certify these.

1 Like

Replicated queries are essentially calling https://sdk.dfinity.org/docs/interface-spec/index.html#http-query via an ingress message. If you called it as a query, you do not have any certification that you can validate. But if you call it as an ingress message, you get the result in the ingress status (via read_state) for which you now have a certification.

canister_status returns private information about the canister. This is information that the canister does not want to expose to the rest of the world only to its controllers. The set of canister’s controllers has always been public. This proposal is not attempting to change any restrictions. The public data will remain public and the private data will remain private.

Precisely, thanks for stating clearly what I meant to say in the original proposal.

Indeed, this is the usual point on which we keep getting stuck.

I suppose, conceptually, what the proposal is suggesting is that read_state return the following struct:

(HashTree, Option<Certificate>. The hash tree is always returned and when executing as a non-replicated call, then the certificate is additionally returned.

Instead would something like the following make sense?

Result<CandidStruct, (HashTree, Certificate)>. Now when executing a replicated call, you get an easier to digest struct and when executing a non-replicated call, you get the HashTree and the certificate.

3 Likes

Something like that, with a generic (not application specific) way to relate the hash tree to the payload in the CandidStruct. So that, for example, Candid UI or ic.rocks can validate such responses from arbitrary canisters.

Hmm, that seems to be a good criteria to evaluate solutions: is it expressive and general enough to be compatible with such canister-agnostic tools.

(Or we just do certified queries on the system level (threshold signatures on query call responses, independent of the main chain, internal link), and save sooo much complexity and effort in upper layers…)

Could also be a good opportunity to defintely decide what information about a canister we would like to be private or public by default? there have been earlier discussions about this before.

I know that there are solutions like setting the blackhole canister as a controller, but as a developer that just seems cumbersome to me (now needing to manage the cycles of the blackhole canister as well just to expose this information).

I know I have seen @wang @stephenandrews and myself express a preference for making the currently private information public by default and optionally private. I’ve seen @Sherlocked say that he doesn’t think the cycle balance of a canister is too much information. I’ve seen @Levi say that cycle balance is too much information.

I would love to hear if other community members have an opinion about this as well.

3 Likes

@Fulco : good points. I agree these need to be addressed as well. This will significantly increase the scope of this RFC. Based on the discussions so far, I think I need to go back to the drawing board a bit on replicated vs. non-replicated queries. I have some vague ideas that I want to write down first.

3 Likes

I was thinking about the above proposed API for read_state and how I was not happy with it. I ended up writing down my thoughts on why I am not happy with it. I am posting them here to keep the conversation going. No precise proposal yet. I think more discussions and design is needed here.

Replicated vs. non-replicated queries

There are two modes of executions possible on the IC. Replicated mode is when all honest nodes on the subnet perform the execution and non-replicated mode is when just a single node performs the execution.

There are two types of functions that a canister can have. Update functions are when the state changes made by the functions are preserved (amongst other capabilities) and query functions are when the state changes made by the functions are discarded (amongst other capabilities).

Since, update functions are modifying the state of the canister, all honest nodes on the subnet need to execute it, in other words, update functions can only run in replicated mode.

On the other hand, state changes from query functions are discarded so they are fine to run in the non-replicated mode. When we originally designed the IC, we asserted that in terms of capabilities, query functions are strictly less capable than update functions so it should be fine to execute query functions as replicated mode as well only with limited capabilities.

The above decision had important ramifications for improving developer experience. It enables update functions to call query functions which then execute in replicated mode. This has the benefit that canisters do not have to provide duplicate function definitions: one callable from update functions and one callable by users in query calls.

Then we implemented support for data certification. When a query function is executing in the non-replicated mode, it can use this feature to return a certificate for data that the caller can validate. As certification validation is only necessary when executing in the non-replicated mode; it is not available in the replicated mode. This means a query function running in the non-replicated mode has different capabilities than a query running in the replicated mode. The certification is also not available for update functions (as they can only run in replicated mode) so now it is no longer the case that query functions are strictly less capable than update functions.

In comment, I made a proposal for how functions executing in replicated mode will have different capabilities than functions executing in non-replicated mode. In particular, I proposed the following function signature:

fn read_state(…) → Result<CandidStruct, (HashTree, Certificate)>

When a function executing in the replicated mode calls this function, it gets back a CandidStruct and when a function executing in the non-replicated mode calls this function, it gets back a (HashTree, Certificate).

This API is not very nice because an update function will always have to handle the Err case even though we know that the Err case is impossible.

What we are seeing above are examples of the different capabilities that query functions have when executing in the replicated and non-replicated modes. More specifically, update functions and query functions in non-replicated mode have different sets of capabilities and query functions in replicated mode have a subset of capabilities of the two.

We can come up with better names for the two classes of queries: queries only callable in non-replicated mode (Q1); and queries callable in both replicated and non-replicated modes (Q2).

We can then say that update functions can only call Q2 queries and in the future when we support inter-canister queries Q1 queries can also call Q2 queries. We can also refine the proposal for read_state to instead define two different functions:

fn certified_read_state(…) → (HashTree, Certificate)

fn non_certified_read_state(…) → CandidStruct

certified_read_state is only callable from Q1 and non_certified_read_state is callable from both update functions and Q2.

4 Likes

Good analysis! This bifurcation, making query methods no longer a restricted update method has always bothered me (certified variables, but also inter-canister calls). But just giving up and letting the developer deal with this even increased complexity isn’t a satisfying answer either. Isn’t a goal to hide the complexities of blockchain and crypto from the user? I hope we have hide it in lower level of abstractions that normal developers don’t deal with.

Like we have reasonably successfully hidden differences between ingress update calls and inter-canister calls from the developer (e.g. polling only needed in one of them; certification only involved in one of them). We should try hard to maintain that level of abstraction, also for query calls. After all, they are “just” an optimization…

4 Likes

I agree, the goal should always be to not expose unnecessary complexity to the user. However, we should also avoid the temptation of pre-mature abstraction. In particular, in this case, maybe the complexity can be hidden by some language extension or a library. I would still like to experiment with the raw APIs without adding support for the abstractions in the system. Upgrading systems while having to maintain backwards compatibility is a chore.

3 Likes

I morally agree with @nomeata’s point that we should not make developers’ lives harder than necessary. But I also think that by using abstractions that do not quite fit, we did this already. The API for certification is one pretty good example. But there are more (meeting with @akhilesh.singhania are often insightful …), like:

  • It is unsafe to call untrusted canisters or canisters on untrusted subnets, which may never return, thus not allowing the calling canister to stop.
  • The semantics of calls to other canisters are not quite what one may expect, as state changes prior to the call are persisted. That means developers must be very careful not to unexpectedly leave state inconsistent during the call or when potentially trapping after it. Which in turn means that one has to understand intricacies of the platform even when working with “nice” abstractions.

To me it seems that sometimes abstractions that don’t quite fit may be worse than no abstractions at all.

2 Likes

Both good reasons to develop in the pure actor model and ditch this convenient but dangerous async stuff :slight_smile:

6 Likes

I don’t think that’s him.

Just want to chime in that once Enable canisters to make HTTP(S) requests is implemented, one can use it to call read_state end point on IC itself. Not exactly a satisfying solution, but will be possible.

1 Like
1 Like

If a piece of private state is only meant to be read by controllers, then the security of a call must ensure that the caller is genuine, not just that the result is certified. But of course this is more of a question about how non-replicated query call is going to work.

That’s a fantastic idea. Certifying data is a huge pain and doesn’t scale well. The only way out of this misery is automation.

Amen. I wrote two async runtimes for Rust canisters, and I still hate all things async passionately. State machines is the only true way to specify and build distributed software.

Also, I wish I learned about TLA+ 3 years ago :slight_smile:

4 Likes