Proposal: Making Variant Fields Optional for Schema Evolution

Hi everyone,

We’d like to share a proposal to update certain data structures in the management canister interface to make them more future-proof.

Currently, some records are defined in ways that limit their ability to evolve. For example:

type change = record { 
   timestamp_nanos : nat64;
   canister_version : nat64;
   origin : change_origin;
   details : change_details;
};

with change_details defined as a variant type:

type change_details = variant {
    creation : record {
        controllers : vec principal;
        environment_variables_hash : opt blob; 
    };
    code_uninstall;
    code_deployment : record {
        mode : variant { install; reinstall; upgrade };
        module_hash : blob;
    };
    load_snapshot : record {
        canister_version : nat64;
        snapshot_id : snapshot_id;
        taken_at_timestamp : nat64;
        source : variant {
            taken_from_canister : reserved;
            metadata_upload : reserved;
        };
    };
    controllers_change : record {
        controllers : vec principal;
    };
};

The problem

Candid variant types are closed enums. Once you define a set of branches, you cannot safely:

  • Rename branches

  • Add new branches without creating a breaking change

In Candid, new variant branches are not ignored. Decoding using the legacy type fails upon encountering an unknown branch in the new type. That means evolving a type like change_details is not possible without possibly breaking existing canisters.

The limitation does not just affect change_details. Other types that use variants—such as source and globals in ReadCanisterSnapshotMetadataResponse—face the same issue. Both are variant-based and therefore cannot evolve safely in the current form. For the same reasons as with change_details, these fields also need to become optional so that they can be extended in the future without introducing breaking changes.

Proposed solution

To preserve flexibility and forward compatibility, we propose making the fields optional:

type change = record {
    timestamp_nanos : nat64;
    canister_version : nat64;
    origin : change_origin;
    details : opt change_details;
};

Similarly, in ReadCanisterSnapshotMetadataResponse, the source and globals fields will also be migrated to optional fields:

type read_canister_snapshot_metadata_response = record {
    source : opt variant { /* ... */ };
    globals : vec opt variant { /* ... */ };
    // ... other fields remain unchanged ...
};

This way:

  • Future changes can introduce new variant branches without breaking existing consumers.

  • We give ourselves the ability to evolve the schema gradually, rather than being locked into today’s definition.

Why this matters

Core management canister records are long-lived pieces of state that must remain stable over time. As features evolve (e.g., new settings, new operations, or extended snapshot metadata), these records will need to be extended. Without this adjustment, each such change risks breaking compatibility with existing tooling and canisters.

By making these fields optional, we ensure long-term stability while still allowing future extensibility.

7 Likes

Is there a reason you cannot put this at an endpoint with _v2(or alternative) at the end?

If people are adding new functionality, they are, by definition, needing to upgrade canisters, and this would force them to migrate to the new _v2 function to use the new variants.

I know I have deployed code that uses the snapshot functionality. I’m guessing this change would break it.

2 Likes

I appreciate your input. We considered introducing a _v2 endpoint, but decided it’s better to update the existing API for a few reasons:

  • Introducing multiple versions of the same endpoint would fragment the API. This makes maintenance harder and increases the risk of inconsistencies between versions.

  • In the case of snapshot metadata, outdated endpoints would eventually have to return an error if the new metadata cannot be encoded in the legacy response type.

  • The change_details record is tightly coupled with settings changes. Using a single endpoint ensures that all changes are reflected explicitly in the canister’s history. With multiple versions, users might miss information if they use an outdated endpoint, it might not be so immediately obvious and most likely confusing as well. With optional fields, users must actively and explicitly handle null values, which is safer.

1 Like

Ok. Degrade away I guess.

a man in a jacksonville jaguars shirt stands in a crowd

You caught me on a chippy morning. This stuff isn’t supposed to break. It just creates mountains of work and testing when it does, and just absolutely destroys credibility and trust that you can build going forward. I get that someone screwed up as null variant has been the best practice since genesis, even when it looks odd and didn’t seem to make sense…and I know from experience that it is difficult to stay disciplined with it, but when breaking it to fix the glitch doesn’t seem very immutable.

1 Like

Hi Austin, thank you for your input. As Alexandra already mentioned, we considered the options internally and we felt that the proposed one is the best despite the downsides you are pointing out. For completeness, Alexandra prepared a response to your last comment that I will post below on her behalf as she’s on vacation for the next 2 weeks and I don’t want the thread to remain unanswered until she’s back. Here’s her reply:

I see your concern – stability is critical, and breaking changes undermine trust. The quote “cannot later revoke or degrade” is a powerful one, and it’s tempting to extend it to protocol immutability. But I think it’s important to look at the context of that quote in the article. If you look at the full passage, he was describing the vision for open internet services that run on top of the protocol. The article itself makes a distinction between protocol-level guarantees (tamper-proof, scalable infrastructure) and service-level design principles (can build composable APIs that others can depend on). What he was articulating is a design principle for building open internet services: once you expose an API, you should design it in a forward-compatible way so you don’t have to revoke or degrade it. Canisters can evolve, but by using patterns like opt fields, they can evolve without breaking canisters. The spirit of the quote is about how to design APIs for open internet services so they can safely interoperate and build network effects.

Coming back to our problem: yes, introducing a *_v2 endpoint would prevent breaking the existing clients, preserving them now as they are. But it does so by fragmenting the API. The new version becomes harder to maintain and inevitably starts to degrade, since it cannot represent a new state. That is exactly the kind of silent degradation the quote warns against.

By contrast, making the fields optional keeps a single, stable API surface while still allowing us to extend the schema safely. Existing clients can continue to decode what they need to understand, while new clients gain access to richer information. In practice, opt is the mechanism that lets us evolve APIs without revocation — whereas *_v2 avoids breakage in the short term but creates long-term degradation.

We recognize that this change introduces work for developers, so we intend to provide time for canister upgrades. The goal is to resolve the issue permanently and strengthen stability going forward.

Thanks for the response.

While feeling a bit less chippy this morning, it might be worth considering a warning in the compiler. I’ve made this mistake too…like should ALL variants just be opt variants by default?

The isn’t really a natural way to learn this odd property of candid other than by shooting yourself in the foot once, and the solution has an odd code smell to it. Guiding devs to an understanding of why you might want your properties to be opt properties and variants to be opt variants might help everyone. cc Motoko guys @ggreif @claudio

Conjecture: We will reduce future compatibility and interop you provide a warning that a user has on a public shared endpoint a variant/property that is not an opt variant/property and thus will break future compatibility if a new variant/property is added to the structure.

3 Likes