Llama.cpp on the Internet Computer

It’s been a journey, but a pre-release of ICGPT with a llama.cpp backend is now live on the IC.

  • The deployed llama.cpp model is Qwen 2.5 - 0.5B - Q8_0 - Instruct
  • You can watch a small video here
  • You can try it out at https://icgpt.icpp.world/
  • A 0.5B model with q8_0 quantization fits fine in a 32bit canister.
  • However, because of the instruction limit, which requires multiple update calls, it takes about 2 minutes to get this answer to the question shown below
  • We did not do any load testing, so it will be interesting to see how it holds up when multiple users try it out at the same time.
  • The UI is still primitive. The same as the one that was developed for the tiny story teller LLM. Improving that is on the to-do list.

2 Likes