Llama2.c LLM running in a canister!

I like to use this thread to share the high level roadmap I have in mind for icpp-llm, and keep you posted on the progress each time I reach a milestone or encounter a blocker.

Milestone 1: remove the limitations of the current tinystories canister, so it can generate stories longer than 20 words.
This means I need to find a way to work around the max instructions per message limit.

Milestone 2: run inference with memory and matrix calculations distributed across multiple canisters.
For this, I plan to use an HPC type appoach, kind of treating the IC as a massively parallel compute cluster.

Milestone 3: Run inference with the llama2_7b_chat model. Not worrying about speed, just the ability to load it and talk to the LLM.

Milestone 4: Optimize and scale.

This is going be a fun challenge.