Persistence for Llama2.c LLM weights in canister

anyon17 · February 16, 2025, 8:34am

Hey, an absolutely amazing project by @icpp ! surely pushing what can be done on IC and on blockchain in general. I saw the post by the author here:
https://forum.dfinity.org/t/llama2-c-llm-running-in-a-canister/21991/56 and read this comment

When installing an LLM in a canister, there are two steps:

Deploy the wasm
Upload the trained model (the weights)
Step 1 is always the same, and that fits without any problem in a canister.

Step 2 however is a different story. Uploading the model weights is done by running a python script that is provided in the icpp_llm repo. This script must be pointed at a model.bin file on your disk. It will read it in chunks, and send it to an endpoint of the canister where the llm is running. When the canister endpoint receives those bytes, they are stored in an std::vector, which is orthogonally persisted. That vector grows dynamically, and there is where the fit comes into play.

Note that this upload is done only once. If the upload succeeds, then the model fits!

Small models, with eg. Millions of parameters fit just fine, but models with several billion of parameters will hit the canister limit.

If you want to dig deeper to see where the memory goes once all fully loaded, [this](https://github.com/icppWorld/icpp_llm/blob/c29712f4d65e8c9ffea62e37b45a7c0f1cd23492/icpp_llama2/README_icpp_llama2_resource_requirements.md#canister-resource-requirements-for-icpp_llama2) is a study I did. I find it very interesting and believe there is still a lot of room for improvement.

My doubt is that the models .bin file are stored in orthogonally persisted vector, that means the whole model weights file is not loaded in wasm memory (canister memory) and only lazily fetches the data as an when required ?

Topic		Replies	Views
Llama2.c LLM running in a canister! Programs & Applications	61	4776	July 1, 2024
Is llama-13B(or 7B) LLM possible to deploy on canister? Developers Discussing	4	467	June 6, 2024
AI player on ICP - IChess Showcase Functional-Programming , Discussing , community-consideration	9	420	October 9, 2024
Llama.cpp on the Internet Computer Programs & Applications	11	399	February 2, 2025
Are you interested in running dynamic wasm scripts (lambdas) on-chain? Developers	10	780	August 30, 2022

Persistence for Llama2.c LLM weights in canister

Related topics