Introducing the LLM Canister: Deploy AI agents with a few lines of code

ddave · July 20, 2025, 12:48pm

Hey @dawnnguyen, sorry for the late reply, I was on PTO. You’ve precisely described the approach and how it’s implemented on the DFINITY side.

The main limitations around input/output limits, context length, and model diversity stem from cost, maintenance, and latency considerations. If cost and maintenance aren’t concerns for you, you’re free to choose whatever serves your purpose best.

One key limitation is that data streaming isn’t supported. Unlike the experience you get when querying ChatGPT, where the response appears character by character as it’s generated, you’ll need to wait for the complete output text. This can result in noticeable delays depending on your model size, complexity, and input length.

Without going into too much detail, additional considerations might be:

Reliability: The off chain worker becomes a single point of failure. Make sure to monitor, have error handling, or redundancy if reliability is important.
Scalability: If usage grows, you might need to redesign the off-chain worker to scale accordingly.

ddave · July 21, 2025, 9:18am

@aespieux The issue with dfx deps pull failing on versions 0.26.1 and higher has been resolved in dfx 0.28.0. The problem was related to the switch from replica to PocketIC in versions above 0.25.0, which caused topology differences that prevented dependency pulling.

You should now be able to upgrade to dfx 0.28.0 and successfully pull the llm canister dependency. Please let me know if you encounter any issues after upgrading.

dawnnguyen · July 29, 2025, 9:59am

Hi @ddave, after implementing a POC, I encounter some issue, could you please give me some advice?

I have a state type that mark the message is queued, processing or responded. And at step Off-chain worker poll the queued message from the LLM canister, I update the state to processing, so that it has to be an update, not a query call => if offchain service continuous polling, it might increase the price due to update call. Are you having the same problem? I’m thinking about keeping the poll function query call, and the offchain worker will filter out the processing msg base on its id, but not sure will it work. Can you share how you handle this?
How can the LLM canister call the chatbot canister to update the responded message? Currently my offchain service can call the LLM canister to poll and update message from LLM canister, but I can’t update the main canister from LLM canister

ddave · July 29, 2025, 11:57am

Hey @dawnnguyen

You can have a query call that does the polling. When it hits a message it wants to process, you can then do an update call to update the state.
The example we provide requires you to have a blocking call from yourDapp → LLM canister that will wait until the off-chain worker picks up the message (query/update) and responds with the update call to put it back into the LLM. That’s why there is a timeout that will wait to process the prompt sent to the LLM canister. If that flow doesn’t work for you can add an endpoint to your dapp that will allow it to put responses into it (which the LLM canister can do with an intra-canister call).

Let me konw if that makes sense.

dawnnguyen · July 30, 2025, 7:17am

@ddave thanks for your answer. The 2nd is very interesting, I will try both solution (blocking until get the response and intra-canister call) and will update the the result later. Personally, I prefer the intra-canister call than the blocking method, but I will measure more to choice the best way to safe cycle

ddave · August 11, 2025, 11:02am

The experimental non-replicated outcalls feature will make the off-chain worker example redundant. Now you can directly use your preferred model or LLM API in Web2 to make model calls, giving you the freedom to choose any model you want. (Note: You’re still operating off-chain, so there’s no consensus or verification that your results were not tampered with.)

sanjays · August 13, 2025, 6:46am

Wonderful. Really got good information.

dfisher · August 15, 2025, 12:46pm

Will on chain llms interact with caffeine at all? Use caffeine to build an on chain portfolio manager of assets? Something like that?

ddave · August 18, 2025, 5:42am

You can try to have caffeine build with LLMs if you give it the information on how to do it. But there’s no direct integration at this point.

aespieux · August 18, 2025, 3:11pm

yeah and for now it only uses the base library you can’t use mops to install ic:llm

Seb · August 20, 2025, 10:00pm

Hey,
I’m trying to setup locally but quite confused:

The LLM canister expects initialization argument but they are not documented anywhere in the docs, I only have the Candid interface available, which makes it difficult to understand what the values correspond to.
I managed to initialize the canister using:

dfx deps init w36hm-eqaaa-aaaal-qr76a-cai --argument "(opt variant { ollama }, opt record { workers_whitelist = variant { all }; api_disabled = false })"

However, when I try to use it, the canister fails to connect to Ollama and times out:

Call was rejected: Request ID: 91c0922cf6f63314223dcd4e470458707741503f7351daf3080a79a03be6566c 
Reject code: 4 
Reject text: IC0503: Error from Canister w36hm-eqaaa-aaaal-qr76a-cai: Canister called `ic0.trap` with message: 'Panicked at 'called `Result::unwrap()` on an `Err` value: (SysFatal, "Timeout expired")', src/ollama.rs:31:6'

How do I fix this? How is the LLM canister supposed to connect to a local Ollama instance? Does it expect Ollama to be running on a specific port or endpoint?

MalithHatananchchige · August 22, 2025, 4:54am

Quick question, I believe the LLMs are computed outside of ICP? and not in the cansister itself right? If so, ICP has a huge stance on privacy and private computing. Dapp teams can’t directly verify whether third-party providers retain prompts. @ielashi do you have any insights on this ?

Seb · August 22, 2025, 5:44am

As far as I know, they mentioned that they don’t retain exact prompts or any data that can be related to a specific user but they do retain general usage metrics (number of prompts/users). Otherwise, you’re totally right, and there is not much we can do at this stage expect trust them

MalithHatananchchige · August 22, 2025, 5:58am

Okay if we follow trust only. Do we know who the providers are, because IC has a very strict system for bringing on NPs with KYC audited by third party and also verified by independent reviewers. Shouldn’t it be the same of AI providers and it to be onchain who and where the service provider or service endpoint is?

Topic		Replies	Views
Llama2.c LLM running in a canister! Programs & Applications	61	4888	July 1, 2024
How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me Developers	13	353	June 16, 2025
Dfinity/llm on the main Net Developers	14	270	July 14, 2025
AI and machine learning on the IC? Developers	114	10145	June 20, 2024
Browser-based AI Chatbot Served From The IC General	47	2737	March 19, 2025

Introducing the LLM Canister: Deploy AI agents with a few lines of code

Related topics