Those of you who are into conversational AI are probably aware of this model that has taken the open source world by storm: karpathy/llama2.c
Using icpp-pro, I am able run this LLM in a canister and do inference:
dfx canister call llama2 inference '(record {"prompt" = "" : text; "steps" = 20 : nat64; "temperature" = 0.9 : float32;})'
(
variant {
ok = "Once upon a time, there was a little boat named Bob. Bob loved to float on the water"
},
)
I created this video that shows the full process of build/deploy/upload/test.
You can find the code in icppWorld/icpp-llm
I believe this is a foundational step in bringing Conversational AI to the IC. Once the infrastructure scales and some limitations are removed, we will be able to scale this up to larger & larger AI models running directly on the IC, without the need for doing the inference on another cloud.