Idea for CRUD types - feedback appreciated

Hi there. What I’m trying to achieve is a way of describing a database entity, but wrapping the native Types in classes. This is the code I have so far -

public type Type<T> = {
  get : () -> T;
  set : T -> ();
  sanitise : () -> T;
  validate : () -> Bool;
};

public class FloatType() {
  public var value : Float = 0;
  public func get() : Float { value };
  public func set(v : Float) {};
  public func sanitise() : Float { value };
  public func validate() : Bool { true };
};

public class IntType() {
  var value : Int = 0;
  public func get() : Int { value };
  public func set(v : Int) {};
  public func sanitise() : Int { value };
  public func validate() : Bool { true };
};

and I’d like to use these with a hierarchy of custom functions -

public func Area()        : Type<Float> { FloatRange(0.001, 1e6) };  // m^2,  
public func Density()     : Type<Float> { FloatRange(0.001, 1e6) };  // kg/m^3  
public func Distance()    : Type<Float> { FloatRange(0.001, 500) };  // m
public func Friction()    : Type<Nat>   { Percent() };               // % 
public func Hardness()    : Type<Nat>   { NatRange(1, 15) };         // H (Mohs + 5 extra fantasy level)
public func Mass()        : Type<Float> { FloatRange(0.001, 1e6) };  // kg   
public func Opacity()     : Type<Nat>   { Percent() };               // %  
public func Resonance()   : Type<Nat>   { Percent() };               // % (fantasy world concept)
public func Temperature() : Type<Nat>   { IntRange(-200, 10,000) };  // °C    
public func Velocity()    : Type<Float> { FloatRange(0.01, 500) };   // m/s 
public func Volume()      : Type<Float> { FloatRange(0.01, 1e6) };   // m^3 

and the Type itself would look like :

public class Base() = {
  var created =        Types.Time();
  var lastModified =   Types.Time();
};

// Ability
public type AbilityID = Types.ID;
public class Ability() = {
  // var _base:     Base = ???;
  // fields
  var name = Types.EntityName();
  var description = Types.Description();
  var something = Types.FloatRange(0.01, 500);
  // relations
  //icon        : IconID;
};

I wondered if anybody had any feedback on this approach. Also is there inheritance or composition in the language yet or is it planned?

Thanks again!

2 Likes

This is less about your overall design but I just wanted to point out that you can do the following to ensure that the classes are correct when defined rather than deferring that check to one of the call sites.

type Type<T> = {
  get : () -> T;
  set : T -> ();
  sanitise : () -> T;
  validate : () -> Bool;
};

class FloatType() : Type<Float> {
  var value : Float = 0;
  public func get() : Float { value };
  public func set(v : Float) {};
  public func sanitise() : Float { value };
  public func validate() : Bool { true };
};

class IntType() : Type<Int> {
  var value : Int = 0;
  public func get() : Int { value };
  public func set(v : Int) {};
  public func sanitise() : Int { value };
  public func validate() : Bool { true };
};

For example, if you accidentally omitted the validate method from the IntType definition you would get this error:

main.mo:16.1-22.2: type error, class body of type
  {get : () -> Int; sanitise : () -> Int; set : Int -> ()}
does not match expected type
  Type<Int> = {get : () -> Int; sanitise : () -> Int; set : Int -> (); validate : () -> Bool}
2 Likes

You can use this hack I came up with to have the type checker verify that a class is a supertype of multiple types.

type Flying = {
  fly : () -> ();
};

type Swimming = {
  swim : () -> ();
};

type Walking = {
  walk : () -> ();
};
class Bird() {
  public func fly() {};
  public func walk() {};
};

// Flying <: Bird
let _ : () -> Flying = Bird;

// Walking <: Bird
let _ : () -> Walking = Bird;
class Dog() {
  public func swim() {};
  public func walk() {};
};

// Swimming <: Dog
let _ : () -> Swimming = Dog;

// Walking <: Dog
let _ : () -> Walking = Dog;

For example, if you were to say that Walking <: Dog (using the let-bound technique I showed here) and then accidentally omitted the walk method from the definition of Dog you would get this error:

main.mo:70.25-70.28: type error, expression of type
  () -> Dog
cannot produce expected type
  () -> Walking

If your class constructor takes arguments then you need to pass those along too:

class Dog(name_ : Text) {
  var name = name_;
  public func swim() {};
  public func walk() {};
};

// Swimming <: Dog
let _ : Text -> Swimming = func(name : Text) { Dog(name) };

// Walking <: Dog
let _ : Text -> Walking = func(name : Text) { Dog(name) };
3 Likes

Hi @matthewhammer,

Just to follow up your statement:

For relational data, I would use nested Tries (the 2D and 3D versions, or deeper if necessary. The TrieMap module provides a wrapper that may get in the way for now.)

What do you mean by trie map provides a wrapper that may get in your way?

Also to piggyback on this thread with some questions:

Based on your CRUD example, I found mappings and abstractions quite straightforward using variants so thank you for that.

Now say I go with a huge trie and want to migrate a 200GB relational db into a 3D TrieMap.

  • Is there a limit cap I need to be aware of?
  • Does the size of trie affects the update query time? Right now it’s about 2 seconds.
  • Is this the best way to migrate a database?

Also the last question how does Motoko Trie compares with ethereum modified Merkle Patricia Trie? ( I’m trying to achieve same idea that each node is referenced by its hash, which is used for lookups in a leveldb database ).

Thanks,
Gabriel

3 Likes

Good question. Sorry, that was imprecise.

Consider the operation join in Trie, which is absent for TrieMap. That’s intentional, though unfortunate.

Binary operations like join and merge require pattern-matching both arguments to be implemented efficiently, so they require the more “exposed” representation (Trie, not TrieMap). Because of the nature of OO abstraction in Motoko, there is no good way to work over the internal structure of arguments whose type is an object (e.g., TrieMap).

By contrast, for (ordinary) algebraic types, like the Trie itself, we currently have the opposite problem: There is no way provided by Motoko to actually hide these types while still pattern-matching them with switch expressions.

My rule of thumb is to use TrieMap unless you need to do something like join over your data.

Now say I go with a huge trie and want to migrate a 200GB relational db into a 3D TrieMap.

Another great question. I think the answer for scaling to that extent (way beyond a single canister) is to switch from “very simple” single-canister data structures like Trie to something that can scale beyond the 1 canister limit without impacting speed. At some point, this scaling will be more transparent, but we have some way to go before that’s a reality.

For the short term, I’d recommend trying a combination of “BigMap” (to be open source very soon, I hope) and some additional structure stored within it that can reconstruct the extra dimensions that would otherwise be flattened into a single (big) mapping. The CanCan application also used this mapping to store video data. The idea is that it’s more general than any one application, and can scale indefinitely.

A lot of the full answer depends on the complexity your data model (from the source data base) — e.g., perhaps a 3D trie isn’t even capturing all of the relationships?

I have some other ideas cooking here that may be related; comments/questions welcome:

Also the last question how does Motoko Trie compares with ethereum modified Merkle Patricia Trie? ( I’m trying to achieve same idea that each node is referenced by its hash, which is used for lookups in a leveldb database ).

If I understand the question, the tries are not quite doing enough as is, but you could modify them to add the information that you are missing. That is, they do not (yet) have the Merkle data structure property that I think you want, where the root of the tree always has the hash of the entire content of the tree (recursively). However, a variant of the same structure could maintain that recursive hash information much like it already maintains the size of the entire structure at each branch node today.

4 Likes

Oops, I meant to respond to this question.

Yes, it does.

The Trie height should grow slowly though, and shouldn’t be the dominate cost if you aren’t also doing some join operations first.

Are you doing the query as a query call, or an update call? (are you sure that you are using the query keyword when you define the actor function for it?)

4 Likes

Hi @matthewhammer , thanks for getting back to me. Sorry for the radio silence I’ve been busy learning new stuff.

Are you doing the query as a query call, or an update call? (are you sure that you are using the query keyword when you define the actor function for it?)

Update call in my case. Using the old lingo for DB queries.

I can see you’re working on something similar for relational entities.

That’s really good, way more advanced than what I have so far.

Is this going to be part of base library? Is this a data structure data is gonna scale and expand over multiple canisters? I still haven’t grasped the concept on how to split a trie map over multiple canisters.

Also I haven’t seen anything about error handlers in inter-communication between canisters. Can a call fail? How can you catch that?

Thanks,
Gabriel

2 Likes

Sorry for the delay. (Was out on vacation last week.)

These questions are great. Let me try to address each with a quick reply, and we can expand as needed.

Is this going to be part of base library

I think every data structure that I create is a candidate for base, and people using it as open source (outside of base, while still experimental), gives me an argument that including it there would be helpful. I’m fine with either outcome (whatever seems to help people the most).

Is this a data structure data is gonna scale and expand over multiple canisters

Yes, that’s my (medium-term) intention. The short-term is what it is today: A single-canister data structure for sequences (or collections of related sequences) that change slowly over time.

I still haven’t grasped the concept on how to split a trie map over multiple canisters.

That’s expected and understandable: The Motoko language still doesn’t have the primitive features that we want to use there (to create dynamic actors on the fly, deploy them, and pass them initialization data). Once we can do that (very soon), the patterns for BigMap, BigSequence, BigSomeOtherStructure will all be similar, I expect. Waving my hands here, I’d say that the general idea is to replace pointers to sub-structures (all within one canister memory) with canister-level indirection, across distinct canisters.

For instance, you can imagine another case to the definition of Trie, where a tree is either #empty or a #leaf (small data size) or a #branch (both subtrees on same canister as branch node) or a #forward node to another canister, that contains the rest of the Trie’s subtree.

Also I haven’t seen anything about error handlers in inter-communication between canisters. Can a call fail? How can you catch that?

Yes, a call can fail, and that failure can be because of system-level as well as application-level errors. The language supports recovering from some of these errors (not others), and the syntax is like this:

  throw <exp>                                    raise error (only in async)
  try <exp> catch <pat> <exp>                    try (only in async)
* try <exp> catch { (case <pat> <exp>;) +} (<finally> <exp>)?  try-finally

To be perfectly honest though, I don’t use this feature myself (yet).

Like in languages like Rust, I would instead advise people to do application-level error handling with a Result type, and not with exception handling (or other error-catching mechanisms, like at the system level). Sometime, we will have a nicer syntax to make it really nice looking (perhaps similar to the syntax that Rust has, where one writes :? instead of ; to implicitly unpack the result and assert that it’s not an error)

4 Likes

Sorry for the delay. (Was out on vacation last week.)

No worries, hope you had a good time.

So I’ve been playing with your graph library (the code from PR). I’ve been actually thinking before I saw your library on how to ingrate something like a database directed graph like ne4j in motoko but not the case anymore. I’ll add a few questions/suggestions to your PR.

replace pointers to sub-structures (all within one canister memory) with canister-level indirection, across distinct canisters.

Great, but how is memory handled here? I know it’s not specific to my inquiry so feel free to skip this part, but I’m curious to know once you add more slaves to your master canister how the memory of a big graph is going to scale up and when? I’m talking about heap, thread stacks and cache.

From what I’ve read so far wasm is single-threaded The future: Deterministic parallelism within a canister? (i.e. multi-core and many-core canisters)

I initially thought that motoko and wasm behave like go routines and they have some sort of Go Runtime scheduler. I’m guessing that’s going to change in the future?

Like in languages like Rust, I would instead advise people to do application-level error handling with a Result type, and not with exception handling

Coming from Golang application error handling make sense but they both have they ups and downs like the infamous

if err != nil { }

Rust seems to behave the same.

Thank you for your time.

Is there any documantation available showing how to use sequence in a project?

Alongside the test in that repo he has used it within some of his other projects, if this is useful, eg:
https://github.com/matthewhammer/motoko-graph/blob/master/src/Persistent.mo

And the test:
https://github.com/matthewhammer/motoko-sequence/blob/master/test/Main.mo

These links are not useful for a beginner. Proper documentation is required.