Introducing the LLM Canister: Deploy AI agents with a few lines of code

ielashi · April 9, 2025, 2:36pm

How large could the token output be in the future? 200 tokens now is very limited.

I agree it is very limited. We plan to increase this to 500 within days, and we’ll be working towards increasing it as we gain more confidence in the stability of the system.

How large is the token input now, and any idea of the future?

The input now is bound to 10KiB. In principle we can get it up to 2MiB, which is the maximum request size. Beyond 2MiB will require some chunking. Are you already hitting the 10KiB limit?

Do you have plans to incorporate Llama 4? Any news or plans about it?

Great question. We’re doing some research to see how well we can support it. More on that soon

ielashi · April 10, 2025, 1:20pm

Quick update: the limit on the number of tokens that can be output per request has been increased from 200 to 1000. The former has been, as already suggested, quite constraining, and now that the system has proven to be stable, we don’t see a problem with handling the additional load.

josephgranata · April 11, 2025, 4:38am

Thanks for the answers.

This reality confirms for me that we must use HTTPS calls to external open source LLMs.

I do hope that Open AI, Anthropic and Google support IPv6 addresses. Given what I’ve seen from CaffeineAI, that seems to be the case.

The use case for this LLM Canister seems to be quite limited for now, I am trying to think what could be a useful application… not sure so far. Still great to see the research effort, and I hope we eventually can run full small size LLM models.

ielashi · April 14, 2025, 8:06am

Thanks for the feedback, @josephgranata. Regarding closed-source LLMs, would you prefer to use them because of their quality or is there another reason? We’re looking into adding support for Llama 4, so hopefully in that case the LLM canister can prove to be more useful.

josephgranata · April 15, 2025, 4:47am

Mr. El-Ashi in fact I do prefer open source models, the best we can possibly use like Deepseek and Lllama 4, however from an ease of building perspective the APIs from Claude, Google and Open AI are worth considering, and the HTTPS route could be a good way to integrate that power into innovative IC AI applications.

That is why it would be good to know if we can use all of them via IPv6 or not? That is why I asked. For open source models it does not matter, because we can host them wherever we want, we can make sure they run in an IPv6 compatible server.

ielashi · April 15, 2025, 9:46am

Thanks for the input @josephgranata.

I don’t know personally, but perhaps you can consult their documentation? There’s another challenge, which is making these calls return a deterministic output. Some of these providers provide a seed parameter for determinism, but that determinism isn’t guaranteed.

v1ctor · April 15, 2025, 12:13pm

@ielashi: Any chance of adding a model for doing embedding? Maybe one of theses: A Guide to Open-Source Embedding Models

With that we could then be able to create RAG solutions on-chain (using a vector DB in Rust). Even with a 10K input limit, that could expand a good deal of applications that could be created using the LLM Canister.

v1ctor · May 2, 2025, 11:24am

@ielashi The LLM Canister is really good!

You can see it generating captions for memes using our OC bot at the Open Chat Botathon

An ‘scary’ example:

ielashi · May 7, 2025, 8:00am

That’s awesome, thanks for sharing!

Any chance of adding a model for doing embedding? Maybe one of theses: A Guide to Open-Source Embedding Models

Thanks for the feedback. Which embedding models would you like to use? We can look into it.

Topic		Replies	Views
Llama2.c LLM running in a canister! Programs & Applications	61	4839	July 1, 2024
Is llama-13B(or 7B) LLM possible to deploy on canister? Developers Discussing	4	488	June 6, 2024
Chatbot Canister some advice Developers	1	31	February 15, 2025
Working with Decentralized LLM Developers	2	520	January 14, 2024
AI player on ICP - IChess Showcase Functional-Programming , Discussing , community-consideration	9	445	October 9, 2024

Introducing the LLM Canister: Deploy AI agents with a few lines of code

Related topics