Im interesting about it.
I got info about IC is about to build some GPU node for canister.
Very looking forward it !
Since AI is quite powerful and useful with high efficient GPUs.
I tried run ollama with llama2-7B on local machine. works ok.
This is CPU computation mode, very slow.
I tried GPU mode , its very fast . But with just soso output quality compare to gpt3.5-turbo
Hi, @q2333gh
Could you provide more detailed example for run LLM in IC?
This is the result of running chat inference for a 0.5B parameter, 1.2G data AI model in a canister on a development cluster:
dfx canister call bwwuq-byaaa-aaaan-qmk4q-cai chat '(record {prompt="Nice to chat with you. I am Yan, the founder of ICPanda DAO. Please introduce yourself."})'
(
variant {
Ok = record {
instructions = 1_753_422_969_950 : nat64;
tokens = 69 : nat32;
message = "\nHello Yan! I\'m Yan, the founder of ICPanda DAO. I\'m a giant panda who has been around for over 10 years and have seen many changes in the world. I\'m here to help you with any questions or concerns you may have about the project. Let\'s get started!\n";
}
},
)
video: x.com
As you can see, it consumed 1753B instructions, while the current ICP mainnet limit for update instructions is only 40B.
Another issue is the memory consumption for loading the model. This model consumed 1.2G of heap memory during loading, whereas the mainnet canister’s heap memory limit is only 4G.
The replica returned a rejection error: reject code CanisterError, reject message IC0503: Error from Canister bwwuq-byaaa-aaaan-qmk4q-cai: Canister called
ic0.trap
with message: failed to run AI, “HeaderTooLarge”, error code Some(“IC0503”)
I am getting above error when tried to run this.