Introducing the LLM Canister: Deploy AI agents with a few lines of code

How large could the token output be in the future? 200 tokens now is very limited.

I agree it is very limited. We plan to increase this to 500 within days, and we’ll be working towards increasing it as we gain more confidence in the stability of the system.

How large is the token input now, and any idea of the future?

The input now is bound to 10KiB. In principle we can get it up to 2MiB, which is the maximum request size. Beyond 2MiB will require some chunking. Are you already hitting the 10KiB limit?

Do you have plans to incorporate Llama 4? Any news or plans about it?

Great question. We’re doing some research to see how well we can support it. More on that soon :slight_smile:

2 Likes

Quick update: the limit on the number of tokens that can be output per request has been increased from 200 to 1000. The former has been, as already suggested, quite constraining, and now that the system has proven to be stable, we don’t see a problem with handling the additional load.

3 Likes

Thanks for the answers.

This reality confirms for me that we must use HTTPS calls to external open source LLMs.

I do hope that Open AI, Anthropic and Google support IPv6 addresses. Given what I’ve seen from CaffeineAI, that seems to be the case.

The use case for this LLM Canister seems to be quite limited for now, I am trying to think what could be a useful application… not sure so far. Still great to see the research effort, and I hope we eventually can run full small size LLM models.

Thanks for the feedback, @josephgranata. Regarding closed-source LLMs, would you prefer to use them because of their quality or is there another reason? We’re looking into adding support for Llama 4, so hopefully in that case the LLM canister can prove to be more useful.

Mr. El-Ashi in fact I do prefer open source models, the best we can possibly use like Deepseek and Lllama 4, however from an ease of building perspective the APIs from Claude, Google and Open AI are worth considering, and the HTTPS route could be a good way to integrate that power into innovative IC AI applications.

That is why it would be good to know if we can use all of them via IPv6 or not? That is why I asked. For open source models it does not matter, because we can host them wherever we want, we can make sure they run in an IPv6 compatible server.

Thanks for the input @josephgranata.

I don’t know personally, but perhaps you can consult their documentation? There’s another challenge, which is making these calls return a deterministic output. Some of these providers provide a seed parameter for determinism, but that determinism isn’t guaranteed.

1 Like

@ielashi: Any chance of adding a model for doing embedding? Maybe one of theses: A Guide to Open-Source Embedding Models

With that we could then be able to create RAG solutions on-chain (using a vector DB in Rust). Even with a 10K input limit, that could expand a good deal of applications that could be created using the LLM Canister.

2 Likes