I was thinking about the above proposed API for read_state
and how I was not happy with it. I ended up writing down my thoughts on why I am not happy with it. I am posting them here to keep the conversation going. No precise proposal yet. I think more discussions and design is needed here.
Replicated vs. non-replicated queries
There are two modes of executions possible on the IC. Replicated mode is when all honest nodes on the subnet perform the execution and non-replicated mode is when just a single node performs the execution.
There are two types of functions that a canister can have. Update functions are when the state changes made by the functions are preserved (amongst other capabilities) and query functions are when the state changes made by the functions are discarded (amongst other capabilities).
Since, update functions are modifying the state of the canister, all honest nodes on the subnet need to execute it, in other words, update functions can only run in replicated mode.
On the other hand, state changes from query functions are discarded so they are fine to run in the non-replicated mode. When we originally designed the IC, we asserted that in terms of capabilities, query functions are strictly less capable than update functions so it should be fine to execute query functions as replicated mode as well only with limited capabilities.
The above decision had important ramifications for improving developer experience. It enables update functions to call query functions which then execute in replicated mode. This has the benefit that canisters do not have to provide duplicate function definitions: one callable from update functions and one callable by users in query calls.
Then we implemented support for data certification. When a query function is executing in the non-replicated mode, it can use this feature to return a certificate for data that the caller can validate. As certification validation is only necessary when executing in the non-replicated mode; it is not available in the replicated mode. This means a query function running in the non-replicated mode has different capabilities than a query running in the replicated mode. The certification is also not available for update functions (as they can only run in replicated mode) so now it is no longer the case that query functions are strictly less capable than update functions.
In comment, I made a proposal for how functions executing in replicated mode will have different capabilities than functions executing in non-replicated mode. In particular, I proposed the following function signature:
fn read_state(…) → Result<CandidStruct, (HashTree, Certificate)>
When a function executing in the replicated mode calls this function, it gets back a CandidStruct
and when a function executing in the non-replicated mode calls this function, it gets back a (HashTree, Certificate)
.
This API is not very nice because an update function will always have to handle the Err
case even though we know that the Err
case is impossible.
What we are seeing above are examples of the different capabilities that query functions have when executing in the replicated and non-replicated modes. More specifically, update functions and query functions in non-replicated mode have different sets of capabilities and query functions in replicated mode have a subset of capabilities of the two.
We can come up with better names for the two classes of queries: queries only callable in non-replicated mode (Q1); and queries callable in both replicated and non-replicated modes (Q2).
We can then say that update functions can only call Q2 queries and in the future when we support inter-canister queries Q1 queries can also call Q2 queries. We can also refine the proposal for read_state to instead define two different functions:
fn certified_read_state(…) → (HashTree, Certificate)
fn non_certified_read_state(…) → CandidStruct
certified_read_state is only callable from Q1 and non_certified_read_state is callable from both update functions and Q2.