Try it
Deployment to main-net went smooth.
icpp_llama2
with the stories15M.bin is now running on-chain in canister 4c4bn-daaaa-aaaag-abvcq-cai
.
You can call it’s inference endpoint with:
dfx canister call --network ic 4c4bn-daaaa-aaaag-abvcq-cai inference '(record {prompt = "" : text; steps = 20 : nat64; temperature
= 0.8 : float32; topp = 1.0 : float32;})'
(
variant {
ok = "Once upon a time, there was a little boat named Bob. Bob loved to float on the water"
},
)
If you play with the parameters, you will quickly run into the instructions limit. That is an area of investigation right now.