Motoko Base Library Changes

One of the major challenges I face during development is managing stable data structures. Currently, I rely on Zhenya Usenko’s migration strategy, which works but requires significant effort if I only want to add a simple property to a user object and often introduces a degree of uncertainty.

I can’t help but wonder, why isn’t this topic discussed more widely?

1 Like

Bluntly…Other chains have immutable contracts so it isn’t an issue. We are the only ones who are going to talk about it. You raise a good point! Let’s talk about it!

The migration strategy is nice in that it allows for upgrading from a much earlier version to the most recent, but man is it also a burden. I’m not even sure it is the best way anymore as I think there are some new limitations with 64OP that may make variants less flexible than they used to be and the migration patter is wholly and completely based on being able to add new variants and things not breaking.(We need to test this at some point).

We have discussed a more direct migration pattern on the motoko calls a couple of times and I think that @luc-blaeser had some thoughts on it. It might be interesting to look into this more, but as with data base schema programing, sometimes you just need write the damn thing and test it. (This is likely an area where we can draw some inspiration because “data as code” from a few years back demanded migration patterns be created.)

1 Like

@icme 's Tweet this morning made me think a bit.

While we are renaming things we may want to consider being much more contextual. I propose an experiment.

Ask an LLM to tell you what Module.replace<K,V>(x: T, y: Hasher, k: K, v: V): ?V does and what Module.replaceAtKeyAndReturnValueIfReplaced<K,V>(x: T, y: Hasher, k: K, v: V): ?V and see how many times, at varying temperatures that it actually outputs the behavior of ‘replaces the value at key if found and null otherwise’. (Maybe more dramatic if you just ask, what does the function replace do vs what does the function replaceAtKeyAndReturnValueIfReplaced do)

If the second one show any kind of significant improvement over the first(and I would anticipate that it will, especially at higher temperatures) then we may want to move toward more contextual function names. Or at least maybe have them available. The way LLMs work, this will get to the target vector much more optimally and concretely that just replace.

A couple of tests:

What does the function replace do? https://chatgpt.com/c/679511ac-5568-8007-86b6-5021cbb7d50a
What does the function replaceAtKeyAndReturnValueIfReplaced do? ChatGPT - Function Explanation Summary

What does let a = replace(x, thash, k, v); Do and what can I expect in a? ChatGPT - Replace function behavior analysis

What does let a = replaceAtKeyAndReturnValueIfReplaced(x, thash, k, v); Do and what can I expect in a? ChatGPT - Function Behavior Explanation

1 Like

@rvanasa will this be accompanied by an update to the moc.js compiler used in the Motoko playground ? I use this for various purposes.

I see what you’re saying and think this could be a useful experiment. The way we are currently approaching this is by using naming conventions from widely-used programming languages. For example, we’re splitting Iter.iterate() into two functions (Iter.enumerate() and Iter.forEach()) so that both LLMs and humans can transfer knowledge of how these functions work from Python, JavaScript, Rust, etc.

Yes indeed! We will distribute all of these changes in the motoko npm package (which includes moc.js) with the ability to swap in previous base library versions as needed.

Awesome thanks for letting me know !

1 Like

We are in the process of addressing this in several ways. In the new version of the base library, all of the data structures can be stored in stable memory. The team is also exploring the possibility of migration-specific language syntax (PR).

2 Likes

That would be great! When do you think these new features will be ready for use?

The goal is for this to be at least partially available for early adopters by the end of February. Because the base library is a critical part of the Motoko ecosystem, this will be a gradual process so that we can address feedback before locking in the final design.

3 Likes

A preview of the improved base library is now available via the new-base Mops package! Stay tuned for the full announcement in the next few days.

2 Likes

New Motoko base library announcement

Here is the official announcement post for the preview release!

We also created an online starter project to simplify trying out the new base library:

You can browse the project’s GitHub repository to learn more. We intend to eventually merge this into the current base library repository.

Next up, we are working on a migration guide and updated documentation, which I will post here once they’re ready. Until then, this is your best opportunity to suggest changes which could make it into the base library before we lock in the new design.

The best way to report a bug or request a specific change is by opening a GitHub issue.

If you have any questions or ideas, please feel free to ask here, and I’d be happy to answer below.

7 Likes

Bit late to the party but im getting around to using new-base and my biggest thing (as mentioned by others) still is the Array vs List vs Iter vs Buffer (RIP). Its not just confusing to someone getting into it but the new List instead of the Buffer makes it more tedious. Most of the time i just want to add a few things together and then form an array or modify it in some way. Seeing

{
    var blocks : [var [var ?T]];
    var blockIndex : Nat;
    var elementIndex : Nat
  };

is freaky and it lacks consistency with Array and Buffer since you cant do things like .vals() and things like List.add(…) modify the list vs returning a value.

What was the reason for killing off Buffer? If i had List in the same form as Buffer that would feel right to me, and if there is a need for the

{
    var blocks : [var [var ?T]];
    var blockIndex : Nat;
    var elementIndex : Nat
  };

it can have a different name or something

1 Like

I get what you’re saying and think this is a valid point. The new List data structure is an adaptation of the vector Mops package, which is a more efficient and scalable version of the original Buffer.

However, we switched from using a class-based API, e.g. buffer.vals() to module functions such as List.values(list) so that it’s possible to store data structures directly in stable memory. The tradeoff is a more verbose usage pattern and more noise in type-checking errors.

We intend to add an improved version of the class-based API once we have the language capability to express this while storing the value in stable memory. It’s a bit of a chicken-and-egg problem because of the release schedule, but we will address this (most likely post-launch).

I will also make sure that we provide several ways to continue using the original data structures (Buffer, OrderedMap, OrderedSet, etc.) for a gradual transition to the new APIs.

5 Likes

If you deep dive the performance stats here: Mops • Motoko Package Manager

… you find that in one particular(common) instance buffer is still superior to vector. If you know the size of your final array and use put to load the buffer only to dump it with toArray then buffer is more cycle and memory efficient than vector. This is a very common use case when you need to transform one collection to another and is used a bunch in query calls where paginating through things or doing any kind of transforms to dump things from classy backend data structure to shareable structures that you can return from actors.

Dig a little deeper and you see how much better Array is at this than Buffer. This scenario avoids the dreaded array.append and its warning.

It may be nice to have TempArray class with a stripped down api that does just that in base. It would be clearly marked temp so no one would use it for long term data storage and it would be the best way to do this kind of loop and dump logic. Or maybe even something more to the point like TempLowMemoryFixedLengthAccumulator (At least the ai would know what it was for😂.)

5 Likes

Can confirm we use this a lot as well.

Would be nice if the array version/implementation was efficient enough that we didn’t have to spin up a buffer just to then convert it back to an array.

5 Likes

At this stage of the new base library, is it advisable to use it in a real project, or is it still too early?

It’s up to you. We are currently working on documentation for the new base library, and there will likely be some breaking changes as we continue to address feedback. We encourage trying out the new base library but still consider it a work in progress.

2 Likes

@skilesare @icme
Thank you for the feedback regarding Buffer and array construction performance patterns.

We’ve considered maintaining Buffer or providing an alternative for efficient known-size array construction. While Buffer shows better performance in some scenarios, we would need a strong reason to maintain both List and a Buffer-like structure in the base library. Our goal is to balance performance with maintaining a clean, approachable base library. There will always be special data structures for narrow use-cases, but I think we should value the ease of learning highly here. Ideally base gives you one way of doing a thing.

Nevertheless, the use case of building an array with a known size is important, so I wanted to discuss the alternatives to using List/Buffer.

  1. Array.tabulate or other ways of building the output array directly is the most efficient solution and should be preferred when possible

  2. Creating a mutable array (with defaults/nulls, e.g. using VarArray.repeat) and converting it later to immutable is the second-best option

  3. While List may be slightly slower for this specific scenario, it provides a more consistent API across various use cases

Please let us know your thoughts!
For more info: have a look at these simple benchmarks comparing List vs Buffer vs Arrays in this use-case. Results are here, look for benchmarks: ArrayBuilding and ListBufferNewArray.

3 Likes

Pardon my density, but can you explain the difference between List/pureList in this comparison? Still taking a look but wanted to know what I’m looking at here. Thank you for doing this analysis!!!

Sure! First of all:

  • List is the mutable resizable vector that starts empty. So it takes some cycles to grow its internal memory (which does better than the old Buffer if the initial capacity is 0)
  • pure/List is a purely-functional, singly-linked list (pure/List in mops). This data structure is not the best for the array building task, but it’s added for completeness. Why? Because, since it behaves like a stack, elements are to be added in reversed order. Also notice that it takes much more memory than other solutions.
  • Buffer always starts with the ideal capacity, thus avoids costly resizings (List starts with capacity 0 and needs to grow). But it is still less efficient than dealing with arrays directly.
  • Dealing with arrays directly is the most efficient solution both in terms of memory and instructions

The details are best explained by directly looking at the code of these different scenarios. E.g. here is the pure/List case from the ArrayBuilding benchmark.

2 Likes