It’s been a journey, but a pre-release of ICGPT with a llama.cpp backend is now live on the IC.
- The deployed llama.cpp model is Qwen 2.5 - 0.5B - Q8_0 - Instruct
- You can watch a small video here
- You can try it out at https://icgpt.icpp.world/
- A 0.5B model with q8_0 quantization fits fine in a 32bit canister.
- However, because of the instruction limit, which requires multiple update calls, it takes about 2 minutes to get this answer to the question shown below
- We did not do any load testing, so it will be interesting to see how it holds up when multiple users try it out at the same time.
- The UI is still primitive. The same as the one that was developed for the tiny story teller LLM. Improving that is on the to-do list.