I’ve seen a lot of skepticism lately around the idea of running AI on-chain, with claims that it’s simply not possible. Our community does a great job explaining the use cases side of things—shoutout to Bobby, who’s been especially helpful—but we’d really appreciate more direct, technical insights from DFINITY engineers and other ICP developers.
It would be fantastic if the team could occasionally jump into the X discussions or put together a technical article detailing how ICP handles AI. Right now, the ICP website mentions AI development but doesn’t fully explain how it works under the hood. An in-depth resource showing the steps, architecture, and any best practices would help community members point skeptics toward credible information.
X is too toxic for any meaningful discussion. More broadly, this problem extends across the entire crypto community: it’s difficult to keep the conversation rational when token holders have a financial stake in the outcome.
ICP smart contracts can run small LLMs and inference engines since their heap memory is up to 4GB. ICP smart contracts can also call APIs of larger LLMs using HTTPS Outcalls.
What would be missing here? What more details would you need to completely understand?
When does DFINITY plan to increase memory limits, given that the WASM64 upgrade has already been completed?
Can you provide more details about the new inference engine being built on ICP, as mentioned in the last R&D video, and the expected benefits it will bring?
I have been following some of the X/Twitter discourse between ICP community and other about AI-on-chain and what that really means (or should mean).
First, hats-off to Bobby, @Ajki, yourself and others who have gone to significant efforts to explain how ICP makes AI-on-chain work as LLM model inference replicated across an IC subnet. It appears very easy for some people (on both sides of these arguments) to resort to “my-chain-is-better-than-your-chain” back and forth which is unproductive and tribal; giving long(er) explanatory answers takes real effort and persistence, so I hear your collective frustration when it comes to getting the info across to others who (maybe) don’t really want to accept or understand ICP and its AI-on-chain implementation.
Second, I fully support your call for “in-depth resource showing the steps, architecture, and any best practices would help community members point skeptics toward credible information” and we should add this to the agenda of the next DeAI TWG meeting to discuss and work on.
Third, “we’d really appreciate more direct, technical insights from DFINITY engineers and other ICP developers” : there are others with deeper developer expertise who will read this and might put up their hand to share their experience with programming LLMs and running them in ICP canisters; my experience is more on the IC platform infrastructure level as a node provider of hardware and keen student of the IC runtime platform.
So if it was me I would try the following approach:
exclude talk about AI model training on-chain for now: it is technically possible as is AI inference on-chain but the requirements for the hardware and software stack are more specific and demanding for training (including “fine tuning” model training);
focus on the meaning of “AI on chain” by reducing it down to what happens for a single “AI task” which is: an execution of data inputs applied to an AI model resulting in data outputs;
assume that we mean “AI on chain” is: the ability to run LLM inference tasks under consensus (blockchain or otherwise) thereby safely and deterministically generating the AI inference task results;
view each LLM inference task as a black-box function which is run repeatedly, generating the next iteration of the result (the next word or image leading to the total output of the “AI task”);
this black-box function is composed of the following kinds of information (as data):
this black-box function is composed of the following kinds of information (as data):
A) input data: a list (vector) of tokens of the same kind as the datasets used to train the model (typically language tokens representing prompt and RAG content, for an LLM)
B) output data: a list of tokens generated by a single inference run (for an LLM this is a set of language tokens matched with a probability of being the next token in the output text sequence)
C) model code: the implementation of a particular deep-learning neural network model, typically a Transformer model, in software code;
D) model data: the pretrained weights (parameters) over which the model code operates when you run the model.
for a specific web3 project: ask if its platform can verifiably do the following:
a) can the input data be submitted to the platform and verified under consensus for either immediate or later use in an LLM inference task run?
b) can the output data of a single inference run be stored and verified under consensus, for either immediate or later use as the results of an “AI task” run?
c) can the model code be deployed to the platform, stored and verified under consensus, for use in an LLM inference task run at a later time?
d) can the model data be deployed to the platform, stored and verified under consensus, for use in an LLM inference task run at a later time?
e) does the runtime platform have the computational capacity (“speed”) to execute a “useful” AI task in a “reasonable” amount of time"? (as determined by the consumer of the task)
f) does the runtime platform have the capability to run smart contract code which AI model code can actually be compiled to? (e.g. YES for a full WASM runtime, NO for any EVM runtime)
I am not sure that all of the item from (a)-(f) above are required to consider a platform capable of “AI on-chain”; it might reasonably be argued that only (a) & (b) are necessary and the rest are “nice to have”.
But I am sure that ICP satisfies all of these requirements with it’s canister smart-contract platform with a WASM runtime ensuring all computational inputs and outputs are made under blockchain consensus. Also, I am sure large quantities of kudos is due to the DFINITY research and engineering teams who built the ICP platform so that “AI on-chain” was an inevitable outcome of the way it is designed and built, rather than an integration with an AI API service which was added on later.
That’s my draft effort at describing an approach to determining if given web3 platform can be considered to run “AI on-chain” and it could do with some critique, correction and improvement from anyone willing to contribute.
I tried to jump in on X as well to support Bobby with explanations. He is indeed doing an admirable job.
Explanations only go so far. For detractors it is too easy to just say you cannot do anything useful with the smaller LLMs we can currently run on ICP. I don’t agree with that, but we need more actual examples and developers & projects building with the on-chain AI.
At onicai, we have started to more aggressively promote our llama_cpp_canister repo. Anyone can have it’s own LLM running on-chain by following the instructions of the README. It is not difficult for a developer to do this. About 1 hour and you got it up & running.
In addition, we are working on a C++ Bootcamp, which will get participants from zero to DeAI hero, ready to build on-chain AI into their dApps.
Many others are working on projects too, and we’re collaborating and collecting data during our Technical Working Group DeAI sessions.
The main challenge is identifying applications where running an LLM on-chain justifies its costs. A key step is making these costs transparent. When people say “it’s impossible,” they likely mean “it’s economically unfeasible.” I think we’re still at a point where even storing a picture on-chain is economically unfeasible; and in the case of tokens, most of their utility lies in speculation.
Fully on-chain AI inference for autoregressive language models like Llamas is definitely a technical possibility today on the IC and very soon on other blockchains (that have hardware acceleration). Running consensus on these inference workloads is a nice property for some applications but increase the costs.
IMO there are a few additional headwinds when it comes to AI on the IC (or other blockchains) that are not related to model size or compute:
It is still unclear today what the ‘AI stack’ will look like. The stack is not yet standardized and change weekly it seems - once it is it will be easier to understand how the IC can be integrated.
Data is not private on the IC. A lot of applications require full privacy. OpenAI or Anthropic offer full privacy and we need to understand how we can structure the AI offering on the IC.
The frontier for the IC and AI would be the ability to leverage zero-knowledge proofs built for matrix arithmetic. I hope I will have time to make some progress there is year.
To be practical: we could probably deliver a nice and re-usable implementation of a Llama model to address the skepticism. Inference will be slow but we could build cool apps. I think the main blocker is money: for running the model (for a workload I wanted to run I realized that the cost would be above $1k on the IC) and building the POC. And of course, it is a risk: nobody could end up using it because it is too early. My 2 cents.