ICRC-16 - CandyShared - Standardizing Unstructured Data Interoperability

ICRC-16 CandyShared

Context

The proposed ICRC16 CandyShared standard defines a Candid interface for unstructured data that canisters can use to exchange document-style data in a standardized way. This standard aims to facilitate the exchange of unstructured data between canisters and improve interoperability between different systems.

Data details

  • icrc: 16
  • title: ICRC16 CandyShared
  • author: Austin Fatheree - austin dot fatheree at gmail dot come or @afat on twitter
  • status: Draft
  • category: ICRC
  • requires: None
  • created: 2023-Mar-10
  • updated: [Current date]

Summary

The ICRC16 standard proposes a Candid interface for unstructured data to facilitate data exchange between canisters in a standardized way.

Introduction

The proposed standard describes the Candid interface for unstructured data that canisters can use to exchange data in a flexible and interoperable way. This interface is built upon the Candid serialization format and defines a set of types can be used to handle various types of unstructured data.

Goals

The main goals of this standard are to:

  • Establish a standard interface for exchanging unstructured data between canisters
  • Facilitate the development of standard libraries in Rust, Motoko, Azel, and Kybra that can convert unstructured data into optimized objects
  • Improve the interoperability of different systems by enabling a standardized approach to unstructured data exchange
  • Simplify the certification and serving of unstructured data, such as JSON data that needs to be served from an Internet Computer canister

Candid Interface Definition

The ICRC16 CandyShared standard defines a Candid interface for unstructured data that includes the following type:

type CandyShared =
  variant {
    Array: vec CandyShared;
    Blob: blob;
    Bool: bool;
    Bytes: vec nat8;
    Class: vec PropertyShared;
    Float: float64;
    Floats: vec float64;
    Int: int;
    Int16: int16;
    Int32: int32;
    Int64: int64;
    Int8: int8;
    Ints: vec int;
    Map: vec record {
      CandyShared;
      CandyShared;
    };
    Nat: nat;
    Nat16: nat16;
    Nat32: nat32;
    Nat64: nat64;
    Nat8: nat8;
    Nats: vec nat;
    Option: opt CandyShared;
    Principal: principal;
    Set: vec CandyShared;
    Text: text;
};

This type defines a set of variants that can be used to represent different types of unstructured data, including arrays, blobs, booleans, bytes, classes, floats, integers, maps, naturals, options, principals, sets, and text.

Complementary standards

This standard can be used by other ICRC standards that require metadata or unstructured data exchange, such as:

  • ICRC-12 - Event Publishers can specify that their data - vec Nat8 - is ICRC16 compliant and can be deserialized using from_candid.
  • ICRC-14 for game stats - The Value type is already very close to CandyShared.
  • ICRC-7 for NFT and other Token standards for metadata. By using ICRC16, these standards would make them selves future compatible.

Possible Extensions and Use Cases

  • ICDevs has developed a motoko library that uses CandyShared and unshares these values into useful structures that can improve the data access and conversion for varius types. These values are stable and can survive upgrades without having to implement pre or post upgrade. GitHub - icdevs/candy_library at 0.2.0
  • The Origyn_NFT standard uses this format for its metadata. It allows the NFT creator maximum freedom in defining the fields they want in their NFT metadata fields. see origyn_nft/test_utils.mo at f3d50ec079ec113932d8f67450d67da5df9993fd · ORIGYN-SA/origyn_nft · GitHub for an example.
  • Zhenya Usenko has the beginning of a library for querying the data structures called CandyPath which could become an addon standard. We should propose an ICRC called CandyPath to codify this language and it would be helpful if it was as close to GraphQL as possible. GitHub - ZhenyaUsenko/motoko-candy-utils
  • We should propose an ICRC called CandySchema that helps define a schema for candy structures so that validation libraries can be written to easily form and validate structures.

Implementation

The ICRC16 standard can be implemented in any language that supports Candid serialization, such as Rust, Motoko, Azel, or Kybra. Implementers can use the standard type and service method to handle unstructured data in a consistent and efficient way. The ICRC16 standard also encourages the development of standard libraries that can convert unstructured data into optimized objects, such as the Candy_Library example provided in the use case section.

Rationale

The need for a standard Candid interface for unstructured data arises from the fact that unstructured data is ubiquitous in many systems, including the Internet Computer. Unstructured data can come in many forms, such as JSON, XML, YAML, or even binary data, and can be used for various purposes, such as exchanging documents, files, or metadata. However, the lack of a standardized approach to unstructured data exchange can create interoperability issues and make it difficult for developers to handle unstructured data in a consistent and efficient way.

By defining a Candid interface for unstructured data, the ICRC16 standard aims to provide a common ground for canisters to exchange unstructured data in a flexible and interoperable way. This standard defines a set of types that can be used to represent and access different types of unstructured data, including arrays, blobs, maps, and text. The standard also complements other Candid-related standards, such as ICRC-12 for Candid extensions, and can be used by other ICRC standards that require metadata or unstructured data exchange.

Security Considerations

The ICRC16 standard defines a Candid interface for unstructured data that can be used to exchange data between canisters. However, care should be taken to ensure that the exchanged data is secure and does not pose a security risk to the system. In particular, canisters should validate the data they receive from other canisters to ensure that it conforms to the expected format and does not contain malicious code or data.

Implementers of the ICRC16 standard should also consider the security implications of their implementation and follow best practices for secure software development. This includes using secure coding practices, validating user input, sanitizing data, and following the principle of least privilege. Implementers should also consider the potential impact of denial-of-service attacks or other forms of attacks that can exploit vulnerabilities in the system.

In particular, the size of a CandyShared object could be used in an attack. Depending on your use case, you may want to check the size of the object before storing or processing it to make sure it doesn’t violate rational use cases.

Conclusion

The proposed ICRC16 CandyShared standard defines a Candid interface for unstructured data that canisters can use to exchange data in a flexible and interoperable way. This standard aims to simplify the exchange of unstructured data and improve interoperability between different systems. We believe that this standard will be useful for developers who need to handle unstructured data in a consistent and efficient way and that it will facilitate the development of standard libraries and tools that can work with unstructured data.

We welcome feedback and contributions from the community to help refine and improve this standard.

edit: Added Ints and CandySchema and CandyPath ICRC references.

4 Likes

Couple questions

What is the PropertyShared type

Why the need for bytes, floats and nats if you can do an array. If so why not something like Ints too

It seems weird that a Map key isn’t a subset of CandyShared, but i guess technically might be ok. Might be tedious for implementors to have to handle all those cases for equality, but i guess structured types make that easier

1 Like

A Property is a member of a class that has an immutable flag. Basically {name: Text; value: CandyShared; immutable: Bool}. @quint did some work with this a while back and has some cool update functions for piecemeal updates. See: candy_library/properties.mo at 0.2.0 · icdevs/candy_library · GitHub.

Bytes, Floats, and Nats came from practical work with image manipulation where you need to ship bytes or transforms in a way that is easily parseable from a cycle perspective from one canister to another.

There is a bit of intent tied up in these types that dictates how they intend to be used. An Array is odd because you can have [#Int(6), #Text(“hello”)] whereas you know with Nats that hey are all Nats and don’t have to parse each item.

I’ve added hash and equality functions in the candy_library already: There are a few assumptions we should talk through, but once we build the libraries in the other three languages no one should have to mess with those: eqShared: candy_library/types.mo at 5e54dd27a27b1319daa0a9a5db236c6c567c69cf · icdevs/candy_library · GitHub

hashShared: candy_library/types.mo at 5e54dd27a27b1319daa0a9a5db236c6c567c69cf · icdevs/candy_library · GitHub

By unsharing the values in to a Candy type these will be much faster as well because they take advantage of Map and Set below the covers.

1 Like

@DunkanMcLoud is also working on a rust implementation: GitHub - IT-Union-DAO/candy-rs: Candy library implementation in Rust

1 Like

Following up on the latter part of Gekctec’s question - why not add a vector Ints type too?

1 Like

Alright. Ive been thinking about this a little more and Im just going to write down some thoughts
That being said I think that this concept is needed and I think your Candy work is great, but something is just not sitting well with me

  1. A lot of these, like the primitives seem redundant
  2. Array vs Floats vs Bytes vs etc… seems redundant
  3. Array and Set and Map, not being generic (for Candid reasons) seems to be odd. Having an array of different types I can maybe see, but the set and map seems more of a stretch. My guess is that it is intended to be all of the same type but it still leaves that open and can make code either have to handle weird cases or people will just have to check if they are the same type, or just assume they are.
  4. Set and Array are the same thing, but with different constraints
  5. Im not sure what the Immutable flag on the properties does. If these are used for data transfer, that shouldnt matter, but maybe if the data is manipulated and sent back, you want to indicate that that has not changed? so another constraint of some sort.
  6. A map and a class is the same thing except there is a constraint on the key to be text

So lots of weirdness, but a solution is more helpful than just ranting.
A common pattern is constraints, but the constraints are not enforced via Candid but rather are trying to convey information to the client.
One thought would be that if two systems are going to communicate then they have to adopt a contract that they both understand what the inputs and outputs will be. I see a lot of time its just based on documentation and the developers have to assume that to be true. But that doesn’t help if we want a standard to be commonly used.

So maybe if there is a way to build meta data on top of the already existing Candid types that include meta data about constraints. That could be more than just adding things like map and set, but also maybe constrain different data types such as int value ranges or max lengths or anything you would want to convey.

So to just throw out ideas without too much thought there is the Json Schema route where you would completely seperate the schema from the actual data itself. such as

Data

{
  "productId": 1,
  "productName": "A green door",
  "price": 12.50,
  "tags": [ "home", "green" ]
}

Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/product.schema.json",
  "title": "Product",
  "description": "A product from Acme's catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    }
  },
  "required": [ "productId" ]
}

Where the schema could be validated against the model during runtime

OR

Something more inline such as you are doing but without declaring each different constraint as a type but include optional information with each type

type CandyShared =
  variant {
    Vec: { data: vec CandyShared; constraints: VecConstraints };
    Bool:  { data: bool; constrains: BoolConstraints };
...
};

Where the data and the constraints are specified but have to be validated at runtime

Again, just throwing out ideas and there are issues with each one

1 Like

Yep…makes sense. I’ve really only used internet when forced to. :grinning: I’ll add it when I get a chance.

I’ll respond more thoroughly in a bit, but yes, we’ve looked at doing schemas for these. It makes a lot of sense.

You can think of class a “record” where immutable is if the member is var or not. In fact, maybe we should call it record instead.

I don’t think candid lets you imply generics does it? Map could certainly be (type, type, (data,data)) but it gets a bit verbose. And one thing that this supports that I haven’t played with is collecting data from different devices and holding data…imagine holding a set of nft metadata from differ collections as a key and mapping it to a balance.

You’ve done more than most in processing and parsing this stuff, so the more feedback the better.

More soon

Ya, no generics, but that should be ok with some schema and validation

I’d propose an ICRC X - Candy Schema and ICRC Y - Candy Path. I really Candy Path can follow GraphQL as close as possible. I know @ZhenyaUsenko done a good bit with Candy Path already.

I’ve just published v0.2.0 of Candy Library that is compliant with the ICRC -16 standard proposal:

It lets you keep both CandyShared and Candy Types in Stable memory in motoko.

There is also a rust crate at https://crates.io/crates/ic_candy. Using these two libraries should help interoperability between motoko and rust canisters, particularly where you have extensible data.

A quick update: ICRC-16 CandyShared - Standardizing Unstructured Data Interoperability · Issue #16 · dfinity/ICRC · GitHub

I’ll need to update candy to v0_3, but I may wait to do this until we get the standard a bit more nailed down.

This update lets us be a supertype of the metadata being used in the transactions in icrc_3.

We may want to call this icrc_16_mini as it is a byte-minimizing subset of the full ICRC16.

I’ve published CandyLibrary 0.3.0-alpha that changes Map to ValueMap and adds Map<Text, Value>. This makes it a superset to the Value type we are considering for ICRC3.

One other type I’m considering is #Variant<Text, Value>.

@Gekctek has some candid and cbor libraries. I think using the motoko_candid library is the only real way to do any kind of reflection on variant types(which is maybe a good discussion point for the motoko team @luc-blaeser, @matthewhammer, @claudio, @ggreif, @aterga, @rvanasa, @kentosugama). Why do we need a #Variant vs a single #Map with one value? I think intent would be the main reason.

What does the developer want to do here? If a candy value goes in, can we spit out the proper type? Maybe it is not needed because the result would have to hand-tailored without any kind of reflection. Coming from a candid type to a Candy you might want to know if the item was a variant or not.

So at this point I’m thinking of adding it, but wanted to throw it open for discussion.

I really want 0.3.0 to be able to fully represent a candid data structure(without functions…unless maybe there is a reason to do that as well).

If we can get a final 0.3.0 that can represent any kind of candid data then we can start looking at schemas and transformation libraries.

3 Likes

Isn’t any Candy value a variant inherently?
Lets say we have 3 Candy User classes

User { id: Principal }
User { id: Text }
User { id: Nat }

They could be completely different Candy classes, or they could be a single Class with id being a Variant of type (Principal or Text or Nat). Usually we don’t care, unless we want to validate the structure. And, it happens that we already have such ability, we can validate id being a variant with the Candy Schema part of my Candy Utils library, specifically the #OneOf validation.

Sure, if we really want to give unique names for our variants, we will need an additional type (or a singleton Map as you sugggeted).

Let me know if all of the above makes sense to you.

With all those changes, do we really need the #Map variant? Looks like it’s the same as #Class. The only difference being that #Class supports mutability/immutability. Wouldn’t it be better to leave only #ValueMap?

Class, which is a [Property] is nice because it lets you store data with intent. If there is a piece of your structure that shouldn’t change you have the immutable flag. We use this extensively in the ORIGYN_NFT to flag the portions of metadata that are not supposed to be mutable. It gives your smart contract something to grab onto that indicates some of the behavior it should have.

For example:

{
        {metadata = #Class([
            {name = "id"; value=#Text(token_id); immutable= true},
            {name = "primary_asset"; value=#Text("page"); immutable=false},
            {name = "preview"; value=#Text("page"); immutable= true},
            {name = "experience"; value=#Text("page"); immutable= true},
]}
]

Here we are making promises to the user that the preview, id and experience will never change.

Now this begs the question now, what about Maps and Sets, etc that are objects that you can get a reference to? They won’t honor these immutable flags if your contract gets a hold of the objects. Ultimately it is up to the contract to enforce the implementation details.

My question was more about the existence of #Map rather than the #Class.

Do we really need the #Map variant if we can rewrite each #Map as a #Class with all entries being mutable?

I agree, but the ICRC working groups have settled on having Map be <Text, Value> and if we want to stay a supertype, we need to support it.

I pushed for it to be <Value,Value> but was overruled in the name of simplicity.:grinning:

1 Like