Llama.cpp on the Internet Computer

icpp · February 2, 2025, 3:16pm

We completed the update to the latest llama.cpp version (sha 615212).

Please start fresh by following the instructions at GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer

This update allows you run many new LLM architectures, including the 1.5Billion parameter DeepSeek model that attracted a lot of attention with this X post.

The main limiting factor of running the larger LLMs is the instructions limit. If a model can generate at least 1 token, you can use it, because we generate tokens via multiple update calls. (See the README in the repo for details.)

Latency is off course high, which hopefully will improve with further ICP protocol and hardware updates, but we believe it is already possible to build useful, targeted AI agents with their LLM running on-chain. It requires some smart prompt engineering, and this is an area where we are focusing our efforts.

To assist with prompt engineering, a python notebook prompt-design.ipynb is included in the repository, where you can run against the original llama.cpp compiled for your native system.

Topic		Replies	Views
Llama2.c LLM running in a canister! Programs & Applications	61	4844	July 1, 2024
Is llama-13B(or 7B) LLM possible to deploy on canister? Developers Discussing	4	490	June 6, 2024
Llama 3 8b is running on-chain! Programs & Applications	6	945	July 16, 2024
How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me Developers	5	129	April 8, 2025
Persistence for Llama2.c LLM weights in canister General	0	18	February 16, 2025

Llama.cpp on the Internet Computer

Related topics