I would like to implement a machine learning package for Motoko. This would be handy when creating canisters interacting with other canisters (e.g. trading bot) or quickly preprocessing data retrieved by oracles by means of future http requests.
Most machine learning methods uses linear algebra methods for instance to multiply, decompose or invert matrixes. Usually linear algebra functions implemented in high level languages like Python call highly optimized well-known packages like LAPACK which offers high-quality and very well tested subroutines for linear algebra.
Reimplementing a linear algebra library in Motoko is out of question due to implementation and maintenance complexity.
In cases like the one described where it is not feasible to implement a functionality in Motoko, does it exist a way to create a Motoko interface able to call an existing library?
@paulyoung, Thank you very much for the reply. I think that to expose a full LAPACK library as a canister is not a good approach in my case because the main motivation of creating a linear algebra in Motoko is precisely to simplify the development of machine learning libraries in Motoko, so having to run an additional canister just to use a Motoko library seems against the main purpose to simplify things.
I really think that to make Motoko a mainstream language to implement sophisticated contracts, it will be necessary a simple way to interface external libraries compiled in WASM.
By the time being, I have no more option that moving my project to Rust and using the many ML/AI libraries available in Rust.
@paulyoung@claudio, Can I use any Rust library if I develop my canister in Rust or are there any restrictions?
Iām asking because @skilesare says that:
I donāt understand what is exactly this limitation. Could you explain to what extend I can reuse Rust libraries?
Rust is a great language to develop on the IC, and many of our official canisters like certified-assets and cycles-wallet are written in Rust. The limitations are those of any other WebAssembly target; std is available, but most functions available in std but not in alloc other than threading primitives will panic/error/generally do nothing useful, such as std::fs functions. This should allow you to compile against most any crate, but any code path that actually attempts to, for example, read the file-system, will unconditionally error. Shouldnāt be a problem for any library crates that donāt interact with I/O at all.
To clarify, there are currently some limitations such as the amount of work that can be done in a single block but these apply to the IC in general and arenāt specific to using Rust.
I am sorry this is really confusing for a beginner like me. Is this paragraph saying that I can basically use any Rust library as long as this library is not attempting to use the file system? is there any other constraint?
Looks like this is one more reason why I need to way to call a library function instead of a canister method. It confirms the need a āsimple way to interface external libraries compiled in WASMā instead of calling a canister.
Concerning my previous question about @AdamS comment, is he saying that āI can basically use any Rust library as long as this library is not attempting to use the file system?ā is there any other constraint?
The file system is one example. I donāt know if thereās a comprehensive list of what does/doesnāt work but it seems there may be an opportunity to create a resource that does so.
Below is another limitation, at least for now. I thought this might interest you as well.
You are still constrained by the inter canister message size.(I think 3MB), but using a canister you can apprend a couple chunks and get up to that limit. The ledger canister is just over 2MB so you have to use this method.
If 10 years in the future, motoko is a dominant programming paradigm in the blockchain space, do you still feel this way?
Iād argue that it is imperative for these libraries to exist in an async-bounded work cycle framework. Even if time slicing solves long running processes, I donāt think it solves blocking and we will be right back at square one where we need chunkable computation for scalability.(Iād be thrilled to be wrong here and hopefully timeslicing doesnāt block).
We have the funds and are growing the community to create these libraries. If you write up a spec for what you need and and provide sample libraries that can be easily portes to motoko, then we can write up an ICDevs bounty to try to get the work done.
Ultimately we likely need a few Manhattan project(without the mass destruction) style projects to brute force some motoko libraries. RegEx, Math Libraries, Templating libraries, workflow libraries, media libraries all come to mind.
I think you are right. Actually Javascript seems to have implemented native linear algebra libraries. So, they probably came down to your same conclusion. For instance ālalolib.jsā or ānumericjsā shows they have implemented singular value decomposition solvers and other solvers natively in JS. Maybe JS libraries could be taken as reference.
I think that a possible implementation strategy could be based on gradually developing 3 packages: 1) Core linear algebra and math tools, 2) machine learning library implementing a set of simple methods and 3) few simple canister examples leveraging 2. Depending of the machine learning library (2) that we decide to implement we would prioritize few core functionalities (1). So it is important to pick a useful yet simple ML method to start with package (2). A good candidate for package (2) are filtering methods like recursive least squares, multi arm bandits and Kalman filters. Filtering methods are interesting because they do not require training, therefore they do not require offline training data because these methods are adaptive. This functionality could be handy for the many āalways onā forms of bots, oracles and other data processing engines.
@skilesare I would like to investigate further the complexities of developing such libraries. I am not familiar with the implications of developing such library in a āasync-bounded work cycle frameworkā. Could you indicate me how to learn more about this specific difficulty when developing a Motoko library?
We could have a lengthy discussion at some point, but the highlights are that you only have so many instructions that you can use. An error occurs once you run out. So if you are doing a long running process(like updating an index on a collection), you may have to āchunkā the process and step through the data set over a number of consensus rounds. Time slicing may fix this, but then I think the canister will be blocked until it finishes.
So ideally, you want to find some number of operations that take up less than half the āblockā and execute your long-running calculation over a number of blocks. In the index example, If you have 10,000 blog entries and processing 2,500 entries takes about 1/4 the block then youād call the process 4 times. Unfortunately, motoko canāt see the current balance on remaining cycles or this would be much easier. Iāve had to just use trial and error in the past to find a good value.