How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me

How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me

It’s impractical… But actually, is it even possible? I was scrubbing to find some resource for you, and on roadmap, I guess this is just a server and the demo doesn’t actually do inference onchain (correct me if I’m wrong):


Basically edge computing/inference. Which is what industry is looking towards. Think consoles vs arcades. PC vs cloud PC.

Ideally, atleast what I envisioned was cloud inference. Decentralized compute. Like Render Network… which is still on the roadmap:

Here’s sample code you can use to create a canister that interacts with your LLM server:

This interacts with LLM canister deployed that uses Ollama, but I don’t think I could find source code of that canister… so that just interacts with DFINITY ollama server.

Before that, you should probably learn how to run your own models to get an idea:

Run it on your computer. Then create a canister that uses HTTPS outcalls to interact with your LLM server.

Use ChatGPT: ChatGPT - ICP Canister Ollama Integration

  • Setup the model and server to communicate with it.
  • Learn how to communicate with it via canister HTTPS outcalls.

The jist of it. To make it running onchain, basically we turn those “GPUs” into “ICP nodes” or “workers” and call it “onchain”.

I remember someone deploying deepseek with 9b parameters to ICP earlier
In fact, there are very few open source LLM models, and the most I’ve looked up before for GPT is open source up to GPT2.5.
deepseek is much better in terms of open source

It might be too expensive, but you can use it directly from your laptop via https://jan.ai/. It was developed by some former blockchain colleagues, so it’s quite trustworthy.

yes Panda had done it in Feb I think

You won’t be able to until DFINITY adds GPU-powered subnets, at least something reasonably fast and comparable to ChatGPT.