Author’s note: This is a draft of something I would like to post on DFINITY medium blog. I have been tweaking it for the last few weeks based on the confusing parts I have seen on the broader narratives of “AI x Crypto” in social media, wider press, and developers forums like this one. I am looking to this community looking for feedback, corrections, mistakes that I should correct.
Special thanks to @ulan @Manu @dsarlis @yvonneanne @Kyle_Langham @ianblenke for reading earlier drafts
What Makes AI on Blockchain Hard?
A lot of folks in the web3 space are excited (rightly in my opinion) about AI on blockchain. I see however, very little in terms of clarity of thought. To be perfectly frank, like all hype cycles, there is a lot of nonsense, so I set out to try to add some clarity.
So I set out to answer a key question:
Is AI on blockchain here? If not, what is missing?
TLDR: AI on blockchain is possible for smaller models, but current limitations in memory, compute power, and consensus mechanisms prevent effective deployment of large models like LLMs.
Understanding AI on Blockchain
To understand this post, there are few concepts worth quickly explaining
1. Training vs Inference
When people refer to AI, they often mean either “training” a model or “inference” (using a model, such as asking questions to ChatGPT). Training is orders of magnitude harder and more resource-intensive than inference. Therefore, my focus here is on inference, as it represents the first significant hurdle before tackling the more complex challenge of training.
2. CPU vs GPU
Being simplistic, GPUs are computers optimized for AI models. They can process models 1000x faster than traditional general computers (CPUs). This is relevant because most AI bottlenecks in the Web2 space are solved by just “throwing GPUs at it.” Blockchains run on CPUs so they (currently) do not have this solution available to them. This post explains why.
3. Memory of smart contracts
Memory of smart contracts is both the storage and heap memory. Both memory types are important for running AI models. Both are limiting factors nowadays.
Key Problems to Solve for AI on Blockchain
1. Memory
What AI models need
Memory requirements for AI inference can vary widely across AI models. For example, small machine learning (ML) models might only need a few megabytes (MB), while large language models (LLMs) can require gigabytes (GB) of memory.
Current state of the world
I want to give the reader a helpful overview of the space, but I will deliberately NOT present a table or chart comparing different blockchains. In my experience, these can lead to two things:
- At best, honest mistakes such as “Hey Diego, you miscalculated! Our smart contract platform does 600 instructions per second not 550.”
- At worst, it triggers blockchain tribalism so the rest of the piece is discounted.
So instead I will write about AI needs, Ethereum (which is a lingua Franca), and ICP (a blockchain I’m intimately familiar with). I encourage Readers to propose their own analysis for other chains!
Ethereum smart contracts
An Ethereum smart contract has 32 KB of memory for the stack. This means Ethereum could not host most AI models I am aware of. There may be some AI models measured in KB, but to be simple: Ethereum smart contracts cannot host what people refer to as models.
ICP smart contracts
An ICP smart contract has 400 GB of stable memory (e.g. storage), but what matters in this context is the heap memory which is 4 GB. This means that ICP smart contracts could host many, but not all AI models. More concretely:
Models ICP smart contracts can run:
ICP smart contracts can run AI models like the one in this demo. That ML model for image classification only requires around 10 MB of memory, so well within ICP’s heap memory resources.
Models ICP smart contracts cannot yet run:
ICP smart contracts cannot yet run Large Language Models like LLama. For example, to run the open source model Llama3 7B, the memory needs vary by how efficient the compression is, but typically at least 3.5 GB or more, perhaps in order of 10 GB, to load into memory and to run.
Currently, ICP smart contracts serve 4 GB of heap memory and will soon have more memory so this is very close to serving properly.
Coming Soon
It is worth noting that DFINITY R&D is actively working on a path to grow the heap memory to which would put it closer to the number above.
Rule of Thumb #1
Whenever someone says “X is AI on blockchain” you should ask:
“How much heap memory can a smart contract on X blockchain have?”
If the answer is…
- Measured in Kb, then it cannot host any real AI model
- Measured in MB, then it can host small models (and there are many small models), but cannot host LLMs
- Measured in GB, it can host some of the smaller LLMs
- Measured in tens of GB, host it can host more, but not the main LLMs
- Measured in hundreds of GB, then it can host pretty much all LLMs
2. Compute
What AI Models Need
The computational power required for AI inference is measured in floating-point operations per second (FLOPS). The complexity and size of AI models can vary widely, impacting the compute power needed. However, In the context of blockchain, it makes more sense to use the more general operations per second , so we will use this term as in practice it tends to be within the same order of magnitude.
Smaller models may need only a few billion operations per second, while large language models (LLMs) and other advanced AI models may require much more. For example, a quantized (basically optimized for size) Llama3 7B model, it can take tens of billions of operations for inference (answering a prompt from a user).
Current State of the World
Ethereum Smart Contracts
Ethereum smart contracts primarily rely on the EVM, which is not optimized for high-performance compute tasks. A more accurate picture would say that the computation for an ETH smart contract is significantly lower than the gigaFLOPS required for most AI models. DFINITY estimates the maximum number of instructions per second from the block gas limit to be around 5 instructions per second. Consequently, Ethereum cannot provide the necessary computational power for running sophisticated AI models, particularly large language models (LLMs).
ICP Smart Contracts
ICP smart contracts have better computational resources, they can perform 2 billion operations per second. Worth noting that (unlike Ethereum which only handles integer arithmetic) ICP smart contracts can also handle floating point arithmetic, as well as integer arithmetic.
Models ICP Smart Contracts Can Run:
ICP can run AI models that require up to billions of operations per second and execute inference within the time that users expect (seconds or less). This includes many smaller models, such as the image classification model in this demo, which only needs a few billion operations per second to run efficiently.
Models ICP Smart Contracts Cannot yet run as fast as users expect:
A quantized Llama3 7B model, it can take tens of billions for inference (answering a prompt from a user). ICP smart contracts can support 2 billion operations per second so, in theory, it would take an ICP smart contract a tens of seconds to minutes to execute an inference request, basically answering a prompt.
Coming Soon
DFINITY R&D is exploring ways to increase the computational capabilities of ICP smart contracts. Potential advancements include integrating specialized hardware or optimizing the execution environment to handle higher operations per second requirements.
Rule of Thumb #2
Whenever someone says “X is AI on blockchain” you should ask: “How much computational power can a smart contract on X blockchain provide?”
-
If the answer is…
-
Measured in millions of operations of seconds or less, then AI inference would take so long that users would consider it not working at all.
-
Measured in hundreds millions of operations of seconds, then very small models can execute inference in minutes.
-
Measured in billions, then smaller LLMs can execute inference in minutes or much slower than what users expect.
-
Measured in tens of billions, then iLLM inference can be what modern users expect from LLMs.
-
Measured in trillions of operations per second, it can host virtually all AI models, including the most advanced LLMs within a great user experience.
-
3. Blockchain-Specific Problems (hint: its determinism)
In the Web2 world, increasing computational resources for a model typically means using GPUs, which are much faster. This is why GPUs are in high demand globally.
Why Can’t Blockchain Just Use GPUs?
Technical Reason:
Since GPUs are inherently designed to be multithreaded, it is not guaranteed that all operations are deterministic, while blockchains require deterministic computation to achieve consensus. In practice, there are ways to make GPUs act deterministically, but it takes careful consideration and configuration. But I will explain the importance of being deterministic first.
Simpler Explanation:
Blockchains operate by having multiple computers perform the same computations and then using a consensus protocol to agree on the result. Blockchains have a security threshold, usually between 25-49%, which determines how many faulty or dishonest nodes they can tolerate while still achieving consensus. However, with GPUs, even honest nodes may return different answers for LLMs even when the nodes all use the same model, creating a problem for consensus protocols.
Illustrative Example:
Imagine a blockchain with three computers, each running an LLM smart contract. A user asks, “What is an LLM?”
-
Computer 1:
- “An LLM, or Large Language Model, is an advanced AI model designed to understand and generate human language, typically characterized by a large number of parameters and trained on extensive text data.”
-
Computer 2:
- “An LLM, or Large Language Model, is a powerful AI system trained on vast amounts of text to perform tasks like understanding, generating, and translating human language.”
-
Computer 3:
- “An LLM, or Large Language Model, is an AI model that excels in processing and generating human language by leveraging extensive training on large datasets.”
Despite all three computers being honest and using the same model, they return different answers. This non-determinism, which can arise for many reasons, is dangerous. The consensus protocol cannot determine which answer is correct. This contrasts with simpler, deterministic calculations like “1 + 1,” where all computers would agree on “2.”
Given the above, I should add a bit more detail. The non-determinism can come even if the model temperature is set to 0. The tricky thing is that the non-determinism comes from the GPUs, not the model itself. And the really tricky thing is that if temperature is 0, the GPUs will return the same answer most of the time, which gives people a false sense of security. But that determinism is not guaranteed. And if it is not guaranteed, then it can lead to situations where a blockchain cannot agree. To put imaginary but concrete numbers: if a GPU is deterministic 99.99% of the time, that means 1 in 10,000 prompts, it may return different answers. Imagine if 1 in 10,000 blocks, the blockchain could not agree… most blockchains would not be able to come to consensus. That is dangerous for consensus.
Key Points:
- Blockchains rely on replicating computation and achieving agreement on the results.
- GPUs introduce non-determinism, making it difficult for blockchains to reach consensus.
- Therefore, current blockchains cannot leverage GPUs like Web2 systems can.
Possible Solutions
The challenge is new, but several potential solutions are being explored (none fully solved at the time of this writing):
-
Achieve Determinism with GPUs: Develop methods to make GPU computations deterministic. This seems possible.
-
Modify Consensus Protocols: Adapt consensus mechanisms to handle non-determinism.
-
Accept Non-Determinism and Use Zero-Knowledge Proofs: Run LLMs on a single machine without replication. This approach, however, comes with a significant downside: it allows the entity running the computation to choose which valid answer to provide. If a model can generate multiple valid responses to a query, the computer (or prover) can rerun the model until it finds a preferred answer, potentially leading to biased or manipulated results. Additionally, this method is much slower than using CPUs or GPU
The ecosystem (including DFINITY) is actively exploring and researching all three approaches to determine the best solution.
Rule of Thumb #3
If someone claims, “My blockchain runs on GPUs,” then one of the following is true:
- They have found a way to run GPUs deterministically or apply approximate consensus mechanisms.
- Their blockchain lacks a robust consensus protocol (and is insecure).
- They are not being truthful.
Conclusion
AI on blockchain is not fully realized yet. While there are promising steps toward integrating AI inference, significant gaps in memory, compute power, and consensus mechanisms need to be addressed. These challenges are not insurmountable, but they require focused research, development, and innovation. By understanding and tackling these hurdles, the dream of combining the power of AI with the security and decentralization of blockchain can become a reality.
Hope this helps folks!