Llama.cpp on the Internet Computer

icpp · October 15, 2024, 6:21pm

It’s been a journey, but a pre-release of ICGPT with a llama.cpp backend is now live on the IC.

The deployed llama.cpp model is Qwen 2.5 - 0.5B - Q8_0 - Instruct
You can watch a small video here
You can try it out at https://icgpt.icpp.world/
A 0.5B model with q8_0 quantization fits fine in a 32bit canister.
However, because of the instruction limit, which requires multiple update calls, it takes about 2 minutes to get this answer to the question shown below
We did not do any load testing, so it will be interesting to see how it holds up when multiple users try it out at the same time.
The UI is still primitive. The same as the one that was developed for the tiny story teller LLM. Improving that is on the to-do list.

Topic		Replies	Views
Llama2.c LLM running in a canister! Programs & Applications	61	4859	July 1, 2024
Is llama-13B(or 7B) LLM possible to deploy on canister? Developers Discussing	4	498	June 6, 2024
Llama 3 8b is running on-chain! Programs & Applications	6	960	July 16, 2024
How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me Developers	13	334	June 16, 2025
Persistence for Llama2.c LLM weights in canister General	0	18	February 16, 2025