How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me

How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me

1 Like

It’s impractical… But actually, is it even possible? I was scrubbing to find some resource for you, and on roadmap, I guess this is just a server and the demo doesn’t actually do inference onchain (correct me if I’m wrong):


Basically edge computing/inference. Which is what industry is looking towards. Think consoles vs arcades. PC vs cloud PC.

Ideally, atleast what I envisioned was cloud inference. Decentralized compute. Like Render Network… which is still on the roadmap:

Here’s sample code you can use to create a canister that interacts with your LLM server:

This interacts with LLM canister deployed that uses Ollama, but I don’t think I could find source code of that canister… so that just interacts with DFINITY ollama server.

Before that, you should probably learn how to run your own models to get an idea:

Run it on your computer. Then create a canister that uses HTTPS outcalls to interact with your LLM server.

Use ChatGPT: ChatGPT - ICP Canister Ollama Integration

  • Setup the model and server to communicate with it.
  • Learn how to communicate with it via canister HTTPS outcalls.

The jist of it. To make it running onchain, basically we turn those ā€œGPUsā€ into ā€œICP nodesā€ or ā€œworkersā€ and call it ā€œonchainā€.

2 Likes

I remember someone deploying deepseek with 9b parameters to ICP earlier
In fact, there are very few open source LLM models, and the most I’ve looked up before for GPT is open source up to GPT2.5.
deepseek is much better in terms of open source

It might be too expensive, but you can use it directly from your laptop via https://jan.ai/. It was developed by some former blockchain colleagues, so it’s quite trustworthy.

yes Panda had done it in Feb I think

You won’t be able to until DFINITY adds GPU-powered subnets, at least something reasonably fast and comparable to ChatGPT.

I don’t understand how this ICP Ninja canister work?
is it has full model 3GB uploaded to ICP? does it have online ollama ? does it interact with external API?

LLM canister is like a gateway or API… actual LLM inference done offchain, pretty sure… So you interact with canister (send it prompts) the canister queries an LLM server offchain, LLM server returns the results to canister, canister returns results to user.

So is AI on-chain a reality yet with ICP, because what I am reading here suggest’s it’s not. Just because you talk to a canister but the inference is done off-chain is not what ā€˜ai-on-chain’ means

It is a reality, just depends on the model you’re using and performance you’re looking for.

I just think at present its kind of impractical/way easier ways to run inference… iunno maybe deepseek runs pretty decent, haven’t tested. If someone can share benchmark would be great.

And reality kinda depends on perspective and context, e.g. BTC vs trad tech performance (super inefficient but has usecase)

For performance and large models you need good hardware… but some smaller and more optimized models can run on weaker devices (edge computing). ICP has some technical limitations (memory and compute) due to the nature of blockchains but improved performance if ICP gets GPU nodes/workers (think render network or vast AI, but for ICP) and I believe there are some upgrades recently done to increase performance.

so the LLM model is not decentralized

inference is on ICP-chain, you can deploy LLM models as ICP canisters. and the canisters do the inferencing

If you’re asking about Introducing the LLM Canister: Deploy AI agents with a few lines of code do not belieber it is desentraliesd… nabady wanna make decentraljzed DAO of GPU workers like render network/akash/vast ai but for ICP clone… wanna do ARRRR DOUBLE YOU WAYS.

Wasm AI inference can work for stuff but I think its impractical for performance and larger models. :

Hence GPU node roadmap.

But anyone can run models. Dfinity has offchain LLM. You can run weaker models onchain. You can run your own LLM server for offchain inference for max performance. Maybe even clone rendernetwork/vast.ai/salad to get people to run models and submit back to ICP canister and pay them :poop: coin, opposed to POW miners wasting GPUs mining for spec: https://whattomine.com/

Good PMF if you ask me, opposed to UBER CLONE that probs a harder feat (even for Uber, even for some promising crypto uber clones:

And some: Arrrrr double you waaaaays.

How to convince to tokenize your house when u cant convince to tokenize simpler stuff like GPU compute.

BTC was ā€œdecentralizedā€ with just Satoshi and frenz.

Just take your frenz, some code and some models, some gpus… and some jippity jip jip… some coolaid… some tokenomics… and decentralize it, :laughing:

But atlas.. we wait for Dfinity to do it with GPU nodes roadmap.

1 Like