Llama2.c LLM running in a canister!

icpp · September 25, 2023, 11:04pm

Thank you for those answers!

Sending ads through an API is indeed not very easy to do. It was a bit of a dumb question

Usage based indeed would make most sense.

The main obstacles to deploying the meta-llama2 models are available resources in a canister, and a limit on the number of instructions per message.

I am working on the llama2-7B model, and hope to get that deployed reasonably soon, so we can run some more experiments.

Definitely interested in digging deeper into enabling your example use case.

icpp · September 25, 2023, 11:09pm

FYI.

The repo is public again, although renamed: icpp_llm

Gamris · September 27, 2023, 2:16am

Great work on this!

Llama2-7B is definitely a milestone goal for canister LLMs. But in terms of “usable” LLMs for applications (ex. NPCs), there are more compact LLMs that are useful enough for generative text/conversational tasks.

TinyLlama 1.1B project follows Llama2’s architecture and tokenizer formats (easier for fine-tuning/Q-LoRA) and is being trained for 3 Trillion tokens with all the speed optimizations and performance hacks for inferencing. Using the Llama.cpp framework, a Mac M2 16GB RAM generates 71.8 tokens/second. That’s really fast for non-GPU inferencing.

Due to its performance for its parameter size, I’d imagine this is to be the model to test canister LLMs.

icpp · September 27, 2023, 10:13am

Hi @Gamris ,

Thank you for pointing out the TinyLlama reference. I will definitely research it potential and perhaps port it into the dApp as an alternative.

How do you think the karpathy/llama2.c LLM that I am using in the canister compares to that one?

It is also based on the identical Llama2 architecture & tokenizer.

icpp · September 28, 2023, 2:19am

The 110M llama2 model with TinyStories data set is up & running on main-net !!

This is quite a jump from the previous 15M model.

The performance is still remarkably good. Each call comes back after ~10 seconds.
Try it out:

dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" new_chat '()'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
dfx canister call --network ic  "obk3p-xiaaa-aaaag-ab2oa-cai" inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.0 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'

This size LLM generates a very coherent story and doesn’t suffer from some of the deficiencies of the 15M size model:

--------------------------------------------------
Generate a new story using llama2_110M, 10 tokens at a time, starting with an empty prompt.
(variant { ok = 200 : nat16 })
(variant { ok = "Once upon a time, there was a little girl" })
(variant { ok = " named Lily. She loved to play outside in the" })
(variant { ok = " sunshine. One day, she saw a big" })
(variant { ok = ", red apple on a tree. She wanted to eat" })
(variant { ok = " it, but it was too high up.\nL" })
(variant { ok = "ily asked her friend, a little bird, \"Can" })
(variant { ok = " you help me get the apple?\"\nThe bird said" })
(variant { ok = ", \"Sure, I can fly up and get" })
(variant { ok = " it for you.\"\nThe bird flew up to" })
(variant { ok = " the apple and pecked it off the tree." })
(variant { ok = " Lily was so happy and took a big bite" })
(variant { ok = ". But then, she saw a bug on the apple" })
(variant { ok = ". She didn\'t like bugs, so she threw" })
(variant { ok = " the apple away.\nLater that day, L" })
(variant { ok = "ily\'s mom asked her to help with the la" })
(variant { ok = "undry. Lily saw a shirt that was" })
(variant { ok = " too big for her. She asked her mom, \"" })
(variant { ok = "Can you make it fit me?\"\nHer mom said" })
(variant { ok = ", \"Yes, I can make it fit you.\"" })
(variant { ok = "\nLily was happy that her shirt would fit" })
(variant { ok = " her. She learned that sometimes things don\'t fit" })
(variant { ok = ", but there is always a way to make them fit" })
(variant { ok = "." })
(variant { ok = "" })

Will expose it in the front end shortly, after more testing. Something is not working yet when you use a non-empty prompt.

cyberowl · September 28, 2023, 7:19am

You are making good progress.

icpp · September 29, 2023, 7:33pm

The frontend at https://icgpt.icpp.world now uses the 42M model as the default:

This size model is a lot better than the 15M model, and it determines by itself when it has completed a story. For example, the prompt Bobby wanted to catch a big fish results in a generative story that ends itself:

Response time of this model is just as good almost as the 15M model, and the words stream onto the screen very naturally.

However, even though the stories are a lot better than the 15M model, this 42M model still throws some nonsensical stuff in there, as in this example. The prompt Bobby was playing a card game with his friend results in a pretty good story, but some of the comprehension is not right and there is a repeat in the middle. This should go away with the 110M model

icpp · September 29, 2023, 9:21pm

A real fun one to use is the prompt:
Billy had a goat.

Artemi5 · September 30, 2023, 6:00am

Did you just put a LLM inside a canister

icpp · September 30, 2023, 10:13am

@Artemi5 ,
yes, the canister is very capable in running an LLM !
The goal is to scale it up more & more now.

evanmcfarland · October 2, 2023, 3:40pm

This is really impressive stuff, especially the streaming effect. I had no idea that was possible. Thanks for open-sourcing this!

Have you done any cost analysis, e.g., cost/token in-canister?

hope888 · October 2, 2023, 10:15pm

Is there a simple way to donate cycles to the canister?thank you!

paulous · October 2, 2023, 11:14pm

Curious to know if anyone has used the new Mistral 7B open-source offering?

thanks!

Severin · October 3, 2023, 6:39am

Here’s a few ways to do it: Topping up canisters - Internet Computer Wiki

hope888 · October 3, 2023, 11:26am

thank you very much

hope888 · October 3, 2023, 11:40am

obk3p-xiaaa-aaaag-ab2oa-cai is the Canister ID ?

hope888 · October 3, 2023, 11:49am

There was an error

Severin · October 3, 2023, 12:07pm

Nono, it’s all fine. The error means that you are not a controller, but you can still use ‘Add Cycles’. I added a note to the guide that this is fine

hope888 · October 3, 2023, 12:57pm

You are right，after closing the pop -up window, the button Add cycles still there.Thanks for your help

!

icpp · October 4, 2023, 3:35pm

@hope888 ,
THANK YOU!!

Topic		Replies	Views
Is llama-13B(or 7B) LLM possible to deploy on canister? Developers Discussing	4	495	June 6, 2024
Introducing the LLM Canister: Deploy AI agents with a few lines of code Developers rust , DeAI	56	3287	June 19, 2025
Llama.cpp on the Internet Computer Programs & Applications	11	481	February 2, 2025
Chatbot Canister some advice Developers	1	33	February 15, 2025
Persistence for Llama2.c LLM weights in canister General	0	18	February 16, 2025

Llama2.c LLM running in a canister!

Related topics