Llama2.c LLM running in a canister!

The costs right now are pretty high. A story that generates about 100 tokens consumes 0.076 TCycles = $0.1.

That will come down drastically though once we can do the inference as a query call and when the compute/AI improvements of the DFINITY Roadmap become available.

I expect to already deploy a real chat LLM in a few months, but latency will be poor. Implementing it now though, so we are ready when the IC scales up.

4 Likes

When DFINITY Roadmap for this become available?

It’s here: Roadmap | Internet Computer

2 Likes