I’m amazed at the results that DALL-E and other AI image generation models deliver and so I’ve started thinking about AI and machine learning on the IC. One of the advantages of the IC is that (as far as I can tell) almost any application can now run directly on-chain (aka served from a blockchain). The only exception I’m seeing are compute-heavy applications like AI. Am I right to assume that even if someone ported all the necessary libraries commonly used in machine learning (Pytorch, etc.) to Rust (or Motoko?), it would be a waste of resources to run these compute-heavy applications on the IC because all the processes would end up being replicated on multiple machines? Are there other reasons that make AI on the IC an impossibility or just a stupid idea?
I think that depends on what you’re referring to. Today you could train neural network outside of IC, and upload the trained model to your canister built with Rust (I’m sure there are webassembly-compatible crates which can consume PyTorch or TensorFlow pre-trained models). If you’d then query that canister using raw.ic0.app
endpoint, it would return without running the query on multiple nodes (therefore 0 waste), however .raw
may return incorrect results if the node that was randomly assigned to resolve your query is malicious, therefore it’s better to use ic0.app
, but that has to go through the consensus, therefore the query will have to run on multiple nodes.
Whether that’s feasible to do currently or not, depends on the size of the trained neural network model. If you’d even be able to fit DALL-E model inside the canister, I think, given its complexity, making it run could be a challenge. However, I can totally imagine successfully running MNIST classifier on the IC today. These two examples represent a) state of the art, and b) hello world, and what’s currently possible (i.e. the limit) lies somewhere on the spectrum between those two.
If you’re referring to training neural network, then your assumptions are correct, the training process would be (with current status quo) a huge waste of resources. I’d also take a very long time as currently the only processing units available on IC are CPUs, and we have no way to attach dedicated hardware to the IC (like tensor processing units, or GPUs).
The same question on me,
Maybe it’s possible to deploy the model on IC unless we explore a unique way to break the current BP algorithm.
Maybe reinforcement learning? But who knows?
I think you’d need a good use case for AI on a blockchain. What does blockchain offer AI that traditional computing cannot?
One idea that’s really exciting to me is a “universal classifier”. For example, imagine some massive neural network running on the blockchain. It’s owned and controlled by no one party. Why is that good? That means people could trust it with their training data. In exchange for tokens, people around the world can provide training data, which the classifier will then validate and train on if validated, in an online setup. The more and the better the training data you provide, the more tokens you get. The tokens would be used to pay for inference.
This incentivizes people who have data but maybe don’t have ML expertise or GPU resources to contribute their data, without fear that it would be used maliciously by Big Tech. The end result would be humanity working together to create the most powerful model in the world (for a given domain), instead of the status quo where multiple corporations train with their own siloed datasets.
I think there are 2 prerequisites to enable IC ML/DL practice:
-
We need a package like motokoTorch/RustTorch to handle tensors. The back-propagation and autoDiff must be carefully implemented by hand. I would like to refer to Cornell professor Rush’s module how to build torch from scratch: https://minitorch.github.io/
-
We need install hardware in IC data center supporting parallel computing.
The (1) would require some human effort but I believe it’s feasible. However, I don’t believe we can achieve (2) in a short period time. It is depends on Dfinity if they would like to add this usecase in the roadmap. Meanwhile, the motoko seals the compelling at application-level and we can’t program canister to decide which subnet your application deploys on. It is impossible to allocate memory of tensors to which chip unite like CUDA based on current ICP architecture. I am not a programing language expert but I am pretty sure It will be a huge upgrade to unseal the motoko compelling.
Here are some of my thoughts. It would be an interesting project to build an L2 consensus layer on ICP to focus the parallel computing and AI development usecase. It would be a very tough but very impactful. To avoid fancy hardware upgrade and complicated parallel computing, you also can try to train your model locally and deploy for inference in IC and run everything via IC data center CPU, but still require someone to implement(1) and automated tool to translate from Python to Motoko/Rust.
Correct me if I am wrong plz
I agree with you about the “handle tensor”.
I think if we want to implement deep learning on IC.
The first problem that we must resolve is how to implement a large amount of matrix computation.
And, you are right, we need to install the specific hardware to implement parallel computation and tensor computation.
Just like Badlands can/ should be a separate subnet, can dedicated hardware be special subnets. I would expect exactly this to happen at some point.
Traditional cloud providers offer a wide array of node types and to eventually bring over most adoption we should offer more capabilities than general compute. It comes down to priorities and probably, imo, we could get around special hardware for now.
Badlands could be the same subnet as the one for dedicated hardware, sort of plug-n-play hardware wild wild west
. Few classes of devices could be supported:
device class | example hardware | use case |
---|---|---|
microcontrollers | Arduino, ARM, RISC-V | ? |
mini-pc | Rasbperry Pi, smartphones | badlands |
GPU | GeForce, Radeon | 3d rendering, light-to-medium ML |
TPU/IPU | NVIDIA A100, Graphcore | heavy ML |
storage | NAS Storage devices | data archive |
You plug the device to the network, network then checks if the device is capable to support computation required by forementioned device class
(standardized test), if it can’t it gets kicked out, otherwise stays.
Yeah, smth like that. I like that. A playground for all types of experiments.
The incredible (even mindblowing) quality of ChatGPT responses makes me think supporting AI/ML on the IC would be a killer feature for a blockchain (if not for training, at least for inference). Compared to other L1s, the IC’s model of deterministic decentralization—and its resulting performance properties—seems like a great fit.
It would unlock very interesting use cases, I think.
The exported model of one of these large language models would be O(gigabytes), so it could theoretically fit in stable memory without a problem. The challenge is reducing inference time, which would require GPUs or TPUs (and maybe RAM?).
I am as blown away as you are after playing around a bit with ChatGPT. The tech is absolutely amazing, and it’s only gonna get better.
I am having trouble, however, finding a good value proposition to having this on a blockchain. Training seems to be out of the question, but even for inference, what’s the gain here? Cost aside, would you really need a replicated, verified inference?
Speaking about cost, it would be really hard to come up with advantages of using 8x hardware to get 1x benefits at the end of the day. I’m in the middle of a large ML project at work, and finding cheap GPUs for both training and inference is challenging as it is. I’m not seeing the benefits of 8x-ing that cost at the moment. Perhaps for some extreme niche use-cases? Maybe, but I’m not convinced.
LLMs are already at the multiple gigs sizes, and only getting larger. The open-source ones, GPT-J and GPT-NeoX need 8/16 - 40GB of VRAM just for inference, probably multiples of that for training. Having that hardware further multiplied for consensus would make the cost prohibitive even compared to the “big 3” cloud providers.
Furthermore, there’s a challenge with time-to-market and the ever-evolving ecosystem. By the time you get to implement something (and it’s probably gonna take some time to get to a production-level replicated, consensus-aware solution) it’s probably gonna be outdated. Just looking at the StableDiffusion space, the speed of development and upgrades in the 6 months since it’s been released is amazing. It got from ~10 seconds of inference w/ 16gb VRAM to ~4 seconds and 2-4gb VRAM needed, inside a couple of months. They’ve added Adam optimizers, fp16 and so on. It’s gonna be really challenging to support anything like that speed of development in a blockchain space.
Cost aside, would you really need a replicated, verified inference?
I totally agree with you that naive replicated execution won’t cut it, even for inference (and especially for training).
I think that the IC may eventually need to consider more efficient ways to implement replicated state machines without actually replicating computation. Here is a fascinating paper I found. Here is another one. It seems like the key lies in cryptographic proofs (whether zero-knowledge or otherwise), and making proof verification as efficient as possible.
Another interesting thing to consider is that transformer neural networks are easily parallelizable (compared to its predecessors), which may lead to interesting performance properties when combined with verifiable state machines perhaps?
@JensGroth @Manu - curious if internally you guys have considered verifiable state machines as I described above, especially for expensive computations like ML?
The flavor of the papers is that instead of replicating computation, you outsource it to a single party who does the computation and also provides a proof the computation is correct. The most efficient proofs are much faster to verify than redoing the computation, so now only one party has to do the computation but everybody can trust the outcome because they can cheaply verify the proofs of correctness.
Unfortunately, the cost of proving the computation is correct is several orders of magnitude higher than the cost of doing the computation. For blockchains with huge replication this is a tradeoff that may be worth taking, but since the IC operates at modest replication (most subnets have 13 nodes) it does not pay off, it is cheaper to do replicated computation as we currently do.
Having said that, I’ll not discard the approach for ML. Sometimes special purpose computation is of a form that is amenable to more efficient construction of proofs. This is not something we’re working on inside DFINITY but I don’t think it needs any platform changes anyway, afaict you just need to verify the outside computation is correct, and since verification is cheap you can run a verifier in a canister (probably the more costly part, depending on application, will be the bandwidth of an ingress message to update the state of the canister to reflect the outcome of the computation)
I personally tried running a zk-paillier program on canister recently. However the construction of proofs requires a lot of instructions, which exceeded the limits of single message can handle. Even before we did try to migrate some deep learning algo to canister which is also limited by the instructions.
Random idea building on this (and somewhat orthogonal to ML): could one imagine a data marketplace. You can sell encrypted data to it for tokens. And buyers of data can use the future threshold decryption feature to get decrypted data. Running on the IC may give new opportunities, e.g., the canister smart contract may restrict who data can be sold to or how many times it can be sold.
Unfortunately, the cost of proving the computation is correct is several orders of magnitude higher than the cost of doing the computation.
I am by no means an expert in this field, but judging by the pace of progress in the last decade in making cryptographic proofs more efficient (in some cases, a trillion times more efficient), I wouldn’t count out the possibility of huge leaps in efficiency in this decade, whether from hardware or better algorithms!
afaict you just need to verify the outside computation is correct, and since verification is cheap you can run a verifier in a canister
This is a really interesting point and something that worries me—do we even need blockchain to do compute or is storing data on a blockchain enough? Ethereum seems to be taking the latter approach, by outsourcing transaction computation to L2 rollups. If an external party can do the expensive ML computation off-chain and simply provide a proof of it (without incurring the efficiency costs of consensus), why would we need blockchain for stateless ML inference? Or am I missing something?
Random idea building on this (and somewhat orthogonal to ML): could one imagine a data marketplace.
This is actually quite deep… Online marketplaces like Uber, Amazon, and Airbnb work fine on web2 because the good or service is usually physical, and there’s no fear of the marketplace itself stealing it. But a web2 data (or even ML model) marketplace wouldn’t work, because the marketplace could steal the unencrypted IP unbeknown to the seller. (Although I’m not sure whether the Signal Protocol running on web2 servers could solve this problem as is…) So IIUC, the on-chain encryption (or threshold key derivation) feature could make this marketplace possible (using web3 tech).
would icoracle has the solution for this kind of problem? utilizing off like computing power to submit calculations works with database handling. but it seems jump outside of current ICP structures.
Before jumping into the tensor multiplication, did anyone try to build a linear regressor on IC? I guess the “ICML” can be built in the following order to approach deep learning eventually:
Linear Regression → K-means → Decision tree → Autodifferentiation → Tensor object and operation → CNN & Transformer → ResNet, BERT → GPT, Dalle etc.
There are many machine learning and deep learning libs based on rust, but I haven’t tested them on IC.
Maybe the difficulty is not to build tensor computation on IC but how to run a pre-trained deep learning model.
Hi there.
Linear Regression → K-means-> Decision tree => machine learning.
CNN & Transformer → ResNet, BERT ->GPT => deep learning model.
Now I think it’s impossible to train a model on IC, I’m researching a rust lib whether it can be deployed on IC.