What Makes AI on Blockchain Hard? [Request for feedback on post]

diegop · July 3, 2024, 7:43pm

Author’s note: This is a draft of something I would like to post on DFINITY medium blog. I have been tweaking it for the last few weeks based on the confusing parts I have seen on the broader narratives of “AI x Crypto” in social media, wider press, and developers forums like this one. I am looking to this community looking for feedback, corrections, mistakes that I should correct.

Special thanks to @ulan @Manu @dsarlis @yvonneanne @Kyle_Langham @ianblenke for reading earlier drafts

What Makes AI on Blockchain Hard?

A lot of folks in the web3 space are excited (rightly in my opinion) about AI on blockchain. I see however, very little in terms of clarity of thought. To be perfectly frank, like all hype cycles, there is a lot of nonsense, so I set out to try to add some clarity.

So I set out to answer a key question:

Is AI on blockchain here? If not, what is missing?

TLDR: AI on blockchain is possible for smaller models, but current limitations in memory, compute power, and consensus mechanisms prevent effective deployment of large models like LLMs.

Understanding AI on Blockchain

To understand this post, there are few concepts worth quickly explaining

1. Training vs Inference

When people refer to AI, they often mean either “training” a model or “inference” (using a model, such as asking questions to ChatGPT). Training is orders of magnitude harder and more resource-intensive than inference. Therefore, my focus here is on inference, as it represents the first significant hurdle before tackling the more complex challenge of training.

2. CPU vs GPU

Being simplistic, GPUs are computers optimized for AI models. They can process models 1000x faster than traditional general computers (CPUs). This is relevant because most AI bottlenecks in the Web2 space are solved by just “throwing GPUs at it.” Blockchains run on CPUs so they (currently) do not have this solution available to them. This post explains why.

3. Memory of smart contracts

Memory of smart contracts is both the storage and heap memory. Both memory types are important for running AI models. Both are limiting factors nowadays.

Key Problems to Solve for AI on Blockchain

1. Memory

What AI models need

Memory requirements for AI inference can vary widely across AI models. For example, small machine learning (ML) models might only need a few megabytes (MB), while large language models (LLMs) can require gigabytes (GB) of memory.

Current state of the world

I want to give the reader a helpful overview of the space, but I will deliberately NOT present a table or chart comparing different blockchains. In my experience, these can lead to two things:

At best, honest mistakes such as “Hey Diego, you miscalculated! Our smart contract platform does 600 instructions per second not 550.”
At worst, it triggers blockchain tribalism so the rest of the piece is discounted.

So instead I will write about AI needs, Ethereum (which is a lingua Franca), and ICP (a blockchain I’m intimately familiar with). I encourage Readers to propose their own analysis for other chains!

Ethereum smart contracts

An Ethereum smart contract has 32 KB of memory for the stack. This means Ethereum could not host most AI models I am aware of. There may be some AI models measured in KB, but to be simple: Ethereum smart contracts cannot host what people refer to as models.

ICP smart contracts

An ICP smart contract has 400 GB of stable memory (e.g. storage), but what matters in this context is the heap memory which is 4 GB. This means that ICP smart contracts could host many, but not all AI models. More concretely:

Models ICP smart contracts can run:

ICP smart contracts can run AI models like the one in this demo. That ML model for image classification only requires around 10 MB of memory, so well within ICP’s heap memory resources.

Models ICP smart contracts cannot yet run:

ICP smart contracts cannot yet run Large Language Models like LLama. For example, to run the open source model Llama3 7B, the memory needs vary by how efficient the compression is, but typically at least 3.5 GB or more, perhaps in order of 10 GB, to load into memory and to run.

Currently, ICP smart contracts serve 4 GB of heap memory and will soon have more memory so this is very close to serving properly.

Coming Soon

It is worth noting that DFINITY R&D is actively working on a path to grow the heap memory to which would put it closer to the number above.

Rule of Thumb #1

Whenever someone says “X is AI on blockchain” you should ask:

“How much heap memory can a smart contract on X blockchain have?”

If the answer is…

Measured in Kb, then it cannot host any real AI model
Measured in MB, then it can host small models (and there are many small models), but cannot host LLMs
Measured in GB, it can host some of the smaller LLMs
Measured in tens of GB, host it can host more, but not the main LLMs
Measured in hundreds of GB, then it can host pretty much all LLMs

2. Compute

What AI Models Need

The computational power required for AI inference is measured in floating-point operations per second (FLOPS). The complexity and size of AI models can vary widely, impacting the compute power needed. However, In the context of blockchain, it makes more sense to use the more general operations per second , so we will use this term as in practice it tends to be within the same order of magnitude.

Smaller models may need only a few billion operations per second, while large language models (LLMs) and other advanced AI models may require much more. For example, a quantized (basically optimized for size) Llama3 7B model, it can take tens of billions of operations for inference (answering a prompt from a user).

Current State of the World

Ethereum Smart Contracts

Ethereum smart contracts primarily rely on the EVM, which is not optimized for high-performance compute tasks. A more accurate picture would say that the computation for an ETH smart contract is significantly lower than the gigaFLOPS required for most AI models. DFINITY estimates the maximum number of instructions per second from the block gas limit to be around 5 instructions per second. Consequently, Ethereum cannot provide the necessary computational power for running sophisticated AI models, particularly large language models (LLMs).

ICP Smart Contracts

ICP smart contracts have better computational resources, they can perform 2 billion operations per second. Worth noting that (unlike Ethereum which only handles integer arithmetic) ICP smart contracts can also handle floating point arithmetic, as well as integer arithmetic.

Models ICP Smart Contracts Can Run:

ICP can run AI models that require up to billions of operations per second and execute inference within the time that users expect (seconds or less). This includes many smaller models, such as the image classification model in this demo, which only needs a few billion operations per second to run efficiently.

Models ICP Smart Contracts Cannot yet run as fast as users expect:

A quantized Llama3 7B model, it can take tens of billions for inference (answering a prompt from a user). ICP smart contracts can support 2 billion operations per second so, in theory, it would take an ICP smart contract a tens of seconds to minutes to execute an inference request, basically answering a prompt.

Coming Soon

DFINITY R&D is exploring ways to increase the computational capabilities of ICP smart contracts. Potential advancements include integrating specialized hardware or optimizing the execution environment to handle higher operations per second requirements.

Rule of Thumb #2

Whenever someone says “X is AI on blockchain” you should ask: “How much computational power can a smart contract on X blockchain provide?”

If the answer is…
- Measured in millions of operations of seconds or less, then AI inference would take so long that users would consider it not working at all.
- Measured in hundreds millions of operations of seconds, then very small models can execute inference in minutes.
- Measured in billions, then smaller LLMs can execute inference in minutes or much slower than what users expect.
- Measured in tens of billions, then iLLM inference can be what modern users expect from LLMs.
- Measured in trillions of operations per second, it can host virtually all AI models, including the most advanced LLMs within a great user experience.

3. Blockchain-Specific Problems (hint: its determinism)

In the Web2 world, increasing computational resources for a model typically means using GPUs, which are much faster. This is why GPUs are in high demand globally.

Why Can’t Blockchain Just Use GPUs?

Technical Reason:

Since GPUs are inherently designed to be multithreaded, it is not guaranteed that all operations are deterministic, while blockchains require deterministic computation to achieve consensus. In practice, there are ways to make GPUs act deterministically, but it takes careful consideration and configuration. But I will explain the importance of being deterministic first.

Simpler Explanation:

Blockchains operate by having multiple computers perform the same computations and then using a consensus protocol to agree on the result. Blockchains have a security threshold, usually between 25-49%, which determines how many faulty or dishonest nodes they can tolerate while still achieving consensus. However, with GPUs, even honest nodes may return different answers for LLMs even when the nodes all use the same model, creating a problem for consensus protocols.

Illustrative Example:

Imagine a blockchain with three computers, each running an LLM smart contract. A user asks, “What is an LLM?”

Computer 1:
- “An LLM, or Large Language Model, is an advanced AI model designed to understand and generate human language, typically characterized by a large number of parameters and trained on extensive text data.”
Computer 2:
- “An LLM, or Large Language Model, is a powerful AI system trained on vast amounts of text to perform tasks like understanding, generating, and translating human language.”
Computer 3:
- “An LLM, or Large Language Model, is an AI model that excels in processing and generating human language by leveraging extensive training on large datasets.”

Despite all three computers being honest and using the same model, they return different answers. This non-determinism, which can arise for many reasons, is dangerous. The consensus protocol cannot determine which answer is correct. This contrasts with simpler, deterministic calculations like “1 + 1,” where all computers would agree on “2.”

Given the above, I should add a bit more detail. The non-determinism can come even if the model temperature is set to 0. The tricky thing is that the non-determinism comes from the GPUs, not the model itself. And the really tricky thing is that if temperature is 0, the GPUs will return the same answer most of the time, which gives people a false sense of security. But that determinism is not guaranteed. And if it is not guaranteed, then it can lead to situations where a blockchain cannot agree. To put imaginary but concrete numbers: if a GPU is deterministic 99.99% of the time, that means 1 in 10,000 prompts, it may return different answers. Imagine if 1 in 10,000 blocks, the blockchain could not agree… most blockchains would not be able to come to consensus. That is dangerous for consensus.

Key Points:

Blockchains rely on replicating computation and achieving agreement on the results.
GPUs introduce non-determinism, making it difficult for blockchains to reach consensus.
Therefore, current blockchains cannot leverage GPUs like Web2 systems can.

Possible Solutions

The challenge is new, but several potential solutions are being explored (none fully solved at the time of this writing):

Achieve Determinism with GPUs: Develop methods to make GPU computations deterministic. This seems possible.
Modify Consensus Protocols: Adapt consensus mechanisms to handle non-determinism.
Accept Non-Determinism and Use Zero-Knowledge Proofs: Run LLMs on a single machine without replication. This approach, however, comes with a significant downside: it allows the entity running the computation to choose which valid answer to provide. If a model can generate multiple valid responses to a query, the computer (or prover) can rerun the model until it finds a preferred answer, potentially leading to biased or manipulated results. Additionally, this method is much slower than using CPUs or GPU

The ecosystem (including DFINITY) is actively exploring and researching all three approaches to determine the best solution.

Rule of Thumb #3

If someone claims, “My blockchain runs on GPUs,” then one of the following is true:

They have found a way to run GPUs deterministically or apply approximate consensus mechanisms.
Their blockchain lacks a robust consensus protocol (and is insecure).
They are not being truthful.

Conclusion

AI on blockchain is not fully realized yet. While there are promising steps toward integrating AI inference, significant gaps in memory, compute power, and consensus mechanisms need to be addressed. These challenges are not insurmountable, but they require focused research, development, and innovation. By understanding and tackling these hurdles, the dream of combining the power of AI with the security and decentralization of blockchain can become a reality.

Hope this helps folks!

diegop · July 3, 2024, 7:52pm

please point out things that are:

pure nonsense. I want only things that real.
mistakes
things that are too vague to be helpful
things that are too specific to be helpful
explanations that fall flat or are confusing

rossberg · July 4, 2024, 8:52am

Many good points, but I want to point out one thing which I think isn’t quite correct:

Many people are confusing parallelism and concurrency. They are not the same thing: the former is about performing multiple computations simultaneously, the latter is performing multiple computations with (some) shared resources. Non-determinism is a consequence of concurrency, not parallelism. Concurrency can occur without parallelism (via interleaving), parallelism without concurrency (e.g. data-parallel computation). The primary purpose of GPUs is data-parallel computation, so in principle that can be deterministic just fine. Of course, you have to program it accordingly.

The reason why GPUs are hard to use on blockchains is that (a) they are generally hard to use and (b) extremely heterogeneous across hardware and hardware generations, which makes their homogeneous use across replicas very difficult, especially in any blockchain that does not prescribe specific hardware.

But from my perspective, the real reason why AI on blockchain is “hard” is that both technologies on their own are already extremely costly in terms of resources (blockchains because of replication and consensus, AI because things like LLMs inherently are massively brute force approaches). Designing hardware for running AI is all about driving down cost. But when combined with blockchain, hardware costs are driven up instead — they don’t just add up, they multiply. So in terms of resource use, it’s a total worst case scenario, more abuse than use.

jokerburger · July 4, 2024, 1:26pm

There is a minor mistake:

ildefons · July 4, 2024, 11:31pm

Context:
I’m AI researcher, full time motoko dev and I developed a motoko library to train random forest models on chain. I have observed a recent trend to equal AI with Large Language Models and actually the AI field is a lot wider than that. I say that because Dfinity seems to be doing that, focussing on LLMs and in particular in the possibility to run model inference on chain so to inherit security and verifiability features from the underlying protocol. However I believe there is at least 3 niches of opportunity beyond that:

Some Ideas:

small to mid-size machine learnign models fully trained on chain. My project Motokolearn is an example: GitHub - ildefons/motokolearn
distributed verifiable execution of simulations using query calls. Running 100s or 1000 of multi-agent based simulations is a typical exercise in economy, socialogy, finance and so on. the possibility of executing those in a verifiable an secure manner has huge value. Also mention that training any reinforcement learning algorithm requires excuting this kind of simulations 1000s of times an these can only be executed on CPU based machines
The third niche beyond inference of LLM models is distributed training of machine learning models using federated learning ideas. Federated learning is a distributed machine learning approach where multiple parties (devices or servers) collaboratively train a shared model while keeping their data decentralized and local. Federated learning is meant for scenarios where data privacy is key. So nobody send any data to a central node but instead send the already trained gradients. For instance, we could figure out a way to distribute light weight replica nodes to private GPU servers with access to respective data

diegop · July 5, 2024, 6:37am

I agree in the general case (the industry equating AI with LLMs). I don’t quite agree on the specific case: Dfinity has started with ML (non LLM models):

To be fair, the most recent AI demo we have is an ML model on-chain (not an LLM):

Does that make sense? Did I misunderstand your intent ?

ildefons · July 5, 2024, 9:38am

Yes, I think you missunderstood my intent. Dfinity has focused on inference of ML models. Right now, a demo like the one you point out is doing inference on-chain on a “simple” multi-label neural network classifier fully trained off-chain and in the future this model will be a more complex LLM or any other SoA model. All of them fall within 1 use case: inference on-chain of ML models. Instead, I propose to look beyond the ML inference use case and outline 3 additional niches of opportunity.

Hope this helps to clarify my intent to show that the space of opportunity of ML/AI within the IC is larger than the single use case of on-chain ML infertence.

peterparker · July 5, 2024, 10:03am

I’m not familiar with AI, but I met a friend yesterday who recently started a CAS (Certificate in Advanced Studies) in ML and AI. He told me that they strictly use only Python. Having collaborated with ETH Zürich as an advisor, I also remember that all students were also strictly using Python. So that made me wonder, what is the status of using Python to make AI on Blockchain / ICP? Is it fully compatible with any Python tool for AI, or are there some roadblocks? And if there are, maybe that’s something interesting to be listed in your analysis?

This might not be related to your original question, Diego, and again I’m out of the loop but I just thought about sharing what crossed my mind in case it’s interesting or not that a dumb question (who knows).

diegop · July 5, 2024, 10:51pm

Thank you everyone for the feedback. I am incorporating it!

Icdev2dev · July 5, 2024, 11:14pm

The biggest issue that ICP can solve is about being the plumbing for AI.

This is specially what i mean:

Currently there are several providers for calling Chatcompletions (openai, groq etc) with varying fees.

What if i want to call a model deployed somewhere that makes use of the Chatcompletion through provider X? How to pay the model?

What if i want to deploy a model making use of Chatcompletion through provider Y? How do i charge for that model? How do i register that model so that others can use?

IC with it’s global infrastructure is aptly suited for this, IMO.

hokosugi · July 6, 2024, 10:46pm

distributed verifiable execution of simulations using query calls. Running 100s or 1000 of multi-agent based simulations is a typical exercise in economy, socialogy, finance and so on. the possibility of executing those in a verifiable an secure manner has huge value. Also mention that training any reinforcement learning algorithm requires excuting this kind of simulations 1000s of times an these can only be executed on CPU based machines

What are the effects and benefits of running this simulation with ICP? The easy way to validate the output of a multi-node run is through consensus, but to do so requires deterministic reasoning. Taking into account the complexity of deterministic reasoning, it is questionable whether this simulation is effective or beneficial.
I am not familiar with it, but I am interested in how mass fitting of DeAI is achieved.

ildefons · July 6, 2024, 11:22pm

Training reinforcement learning algos or extracting simulation traces to compute sufficient statistics require running 1000s of time simulations which in general does not require and does not take advantage of GPU. Therefore IC nodes could be a priori a suitable platform. The execution of simulations in the IC would provide the added security and verifiability features inherited from the underlying protocol. Simulation samples can be generated as updates calls (require consensus) and queried multiple times using query calls (do no require consensus). the second part of your question is about the third nich of opportunity, what you call deAI and I call federated learning. This third use case, require a more elaborate response requiring the possibility of deploying private IC subnetworks and i am not yet prepared to give the details as it is part of a longer grant proposal still under development.

apotheosis · July 10, 2024, 11:29pm

Kinic DAO members are building a lot of tooling that feed web3 AI.

Local models (in-browser) and on-chain vector DB as a recent example. Many large organizations are contributing to open-source AI that are becoming more and more powerful at smaller sizes (for browsers, phones and other constrained devices). With wasm64 a lot of these would run well on the IC

Accept Non-Determinism and Use Zero-Knowledge Proofs: Run LLMs on a single machine without replication. This approach, however, comes with a significant downside: it allows the entity running the computation to choose which valid answer to provide. If a model can generate multiple valid responses to a query, the computer (or prover) can rerun the model until it finds a preferred answer, potentially leading to biased or manipulated results. Additionally, this method is much slower than using CPUs or GPU

Most of the serious AI for web3 startups are using mixed models like this… none are saying “we are running LLM on ETH”. For example TEE with some consensus mechanism that validates DCAP on L1 which shows that the program ran correctly within the TEE. Or ZKP as you mention.

What prevents them from re-running? Nothing. But it should not matter. As the inputs (user query) is committed to. With xyz user query, we ran program F, it outputed xyz result; i.e. it does not matter if they repeat program F, as long as it is a valid result that was constrained by the inputs and valid execution of the function. If non-determinism does not work in the general sense, the use-case should not be using LLM or AI.

Or another example, token sampling in an LLM. could force it always to be the same by having it deterministically seed generated.

Moreover, I think the ZK aspects have more interesting use-case.

Private inputs. You could have a provider run the model F with private inputs(xyz) and return a result. Or as trends suggest, the models themselves will be running locally with private inputs. Private biometrics and ‘the chain’ is only orchestrating and verifying the ZKP results at scale. The chain could also be providing tamperproof data (https://ai.kinic.io/) to the locally running models.
The models themselves could be private. One example here would be a trading bot that executes off-chain and the ZKP is demonstrating that a model was ran and the outputs were xyz. Anyone can verify that those results created xyz profits, while not exposing the potentially proprietary trading model. You could rent out and secure usage of this model while keeping it private.
Private weights. Companies spend tons of resources getting model weights. They might use ZKP for inference using model abc but keep their weights private. The ZKP would prove the correct execution of the model with private weights.

I would add more about ZKP to your article as the IC is uniquely able to verify a large vareity of ZKP where other blockchains cannot.

i.e. ETH needs to use specific curves and proof schemes such as Groth16. ICP does not have these constraints.

Agnostic · July 11, 2024, 8:17pm

If you’re writing to tech developers in the field, then your article is very good as is. If you’re writing to a larger audience (the average joe, investors, etc.), then I think more should be emphasized when it comes to the application or use case aspect.

For instance, I have no technical skills at computer programming/tech (other than some HTML/CSS), and what would matter most to me are results and use case. Is all of this technical code stuff already being used for a dapp, is it worth/beneficial using blockchain tech over web 2 tech, etc. In other words, for a blockchain project to say that they can run Ai would mean to me that they actually have some product to demonstrate that, like a dapp. This is the problem I ran into when reading a discussion comparing Arweave (AO) with ICP. Most members were so bogged down in the technical details, and surprisingly no one bothered to ask for a working dapp on AO or a demo until someone like me brings it up - a someone that just wants to try out a cool Ai dapp. To date, my request for a working AI dapp on AO was never answered. Decided to explore AO’s ecosystem myself just to find out that there weren’t any Ai dapps running (maybe not yet?..unless someone can show otherwise).

diegop · July 11, 2024, 11:27pm

Thank you for the increase round of feedback. I am
Incorporating it into the post. I’m actively considering breaking it up to account for different themes and audiences

AmSpeed · July 17, 2024, 5:04am

Making GPUs Deterministic solved?
Someone posted the following on X in response to @dominicwilliams AI demo:

Towards the end you mention that GPUs are non-deterministic so this will be technically challenging. Just to share the following which may help in this regard.

Deterministic training is possible with multi-GPUs. To illustrate, check out the docs for Nvidia’s Clara platform, AI solutions to improve healthcare delivery and accelerate drug discovery, link below.

Determinism can be achieved by setting the Python, Random, Numpy and TensorFlow seeds. Additionally, you can control for the two other sources of non-determinism, namely multiple workers in the TensorFlow pipeline (tf.data.Datasetpipeline) and the Horovod Tensor Fusion environment variable. By setting multiple workers to 1 for deterministic training and the HFT environment variable to 0 for determinism for multi-GPU training.

According to the Clara docs to eliminate all sources of randomness. It is recommended that the number of GPUs, the GPU architecture, driver versions, all framework versions, and the setup are all the same.

Link: https://docs.nvidia.com/clara/clara-train-archive/3.1/nvmidl/additional_features/determinism.html

diegop · July 17, 2024, 6:54am

I think I over stated how hard it is to make them deterministic in my original piece. My new version will make it clear that it’s important to do right so thought and care must be taken to do it right

Now note: on context of a blockchain…. Id a blockchain accepts GPUs and they are not standardized, the protocols must take into account this extra care potentially.

Kibosh · July 17, 2024, 6:02pm

Hey Diego,

I recently wrote a non-technical article on X on DeAI for ICP with a focus on LLMs, link below.

Feel free to reference this if you want to explain the topic for a wider audience.

One thing I’d highlight is my assumption that a different approach to consensus would be necessary for training and inference of large AI algorithms:

“My hypothesis was that to achieve decentralized AI would require GPU enabled nodes and AI subnets with the ability to circumvent regular consensus to enable AI models to be trained.

This seems to be the direction of travel according to the revised IC Roadmap, specifically Cyclotron, online AI inference, and Gyrotron, training of large models using AI specialized subnets with GPU enabled nodes. I haven’t seen any information yet regarding consensus or how this will be handled on the AI specialized subnets, to ensure billions of model parameters can be calculated in a performant way, but this is something I’ll certainly be looking into separately.”

In the latter sections I explore hyperparameters and parameter tuning for LLMs.

Hope this helps and looking forward to reading your article.

Cheers,
Kibosh

diegop · July 17, 2024, 7:51pm

Thank you. Will read this!

Lorimer · July 18, 2024, 10:04pm

I’m looking forward to reading the final article. I hate to do a drive-by post, but I have a half formed idea rattling around in my head (and I need to put it somewhere) about the apparent ‘abuse’ of compute by running AI models on chain. Ensemble models are not uncommon (running the same or similar models in parallel, but with some noise, and then taking a combination of the results, for example an average) to improve robustness. Obviously this would violate consensus - but the idea of a more relaxed form of consensus for running AI on chain seems like a potentially fruitful idea (rather than requiring identical results, requiring results that are within some distance measure of each other). This could turn the ‘abuse’ back into ‘use’ (but obviously comes with security implications attached).

Topic		Replies	Views
AI and machine learning on the IC? Developers	114	10083	June 20, 2024
AI on Chain Skepticism Programs & Applications	10	482	January 17, 2025
The Current State of the Art of AI and Blockchains General	7	176	November 14, 2024
Technical Working Group DeAI Developers	330	13476	July 17, 2025
Near Protocol vs Internet Computer on Ai app creator General	49	943	November 11, 2024