Is llama-13B(or 7B) LLM possible to deploy on canister?

Im interesting about it.
I got info about IC is about to build some GPU node for canister.
Very looking forward it !
Since AI is quite powerful and useful with high efficient GPUs.

1 Like

I tried run ollama with llama2-7B on local machine. works ok.

This is CPU computation mode, very slow.

I tried GPU mode , its very fast . But with just soso output quality compare to gpt3.5-turbo

1 Like

llama3-8B is released. got performance equally to llama2-70B.What a great things happen

Hi, @q2333gh
Could you provide more detailed example for run LLM in IC?
:wink:

1 Like

This is the result of running chat inference for a 0.5B parameter, 1.2G data AI model in a canister on a development cluster:

dfx canister call bwwuq-byaaa-aaaan-qmk4q-cai chat '(record {prompt="Nice to chat with you. I am Yan, the founder of ICPanda DAO. Please introduce yourself."})'
(
  variant {
    Ok = record {
      instructions = 1_753_422_969_950 : nat64;
      tokens = 69 : nat32;
      message = "\nHello Yan! I\'m Yan, the founder of ICPanda DAO. I\'m a giant panda who has been around for over 10 years and have seen many changes in the world. I\'m here to help you with any questions or concerns you may have about the project. Let\'s get started!\n";
    }
  },
)

video: x.com

As you can see, it consumed 1753B instructions, while the current ICP mainnet limit for update instructions is only 40B.
Another issue is the memory consumption for loading the model. This model consumed 1.2G of heap memory during loading, whereas the mainnet canister’s heap memory limit is only 4G.