AI and machine learning on the IC?

cymqqqq · March 16, 2023, 2:00am

Hi all, I have a question, raw_rand in ic_cdk::api::management_canister::main - Rust
So raw_rand is the Verifiable Random Function(VRF)? right?

Severin · March 16, 2023, 8:15am

Have a look at this thread: Is raw_rand a VRF?

jb747 · April 12, 2023, 6:29pm

training a leading-edge, commercial large language model is very expensive (fine-tuning a LLaMA 64b variant is much easier but it cannot be used commercially) and it is done on clusters of high-end TPUs and GPUs designed to spread the load rather than duplicate it. The Gen-2 IC nodes and blockchain consensus aren’t designed for this.

The latest language models are increasingly capable of providing instructions to weaponize dangerous viruses and perform other dire acts so distributing the weights for inference is becoming a sensitive issue. The rapid increases in parameter count and training have resulted in sudden jumps in the ability of these models to reason. The benchmarks used to measure them are becoming significantly harder and the models are likely to surpass average human scores within the next 1-2 years. Dfinity and the 2nd gen clouds might well have to satisfy governmental scrutiny before they will be allowed to host inference weights for the newer models. I don’t really use Siri or Alexa that much, but some believe that the new models will give a significant boost to the assistants and transform user interfaces. Regardless, the backends are very likely to make heavy use of the new models. If so, then persuading one of the (commercially-licensed) models to jump to the IC, at least for inference, might be important.

jb747 · April 13, 2023, 6:55pm

perhaps an optional inference co-processor or specialized inference software for the Gen-2 nodes would help.

Severin · April 13, 2023, 7:02pm

For now we’re trying to keep nodes pretty homogenous because it keeps node management and node provider rewards substantially more simple. Mandating specialised HW for that seems a bit overkill for now. But for the further future having specialised nodes sounds really useful

jb747 · April 14, 2023, 4:55pm

it does seem likely that the move to 3nm will allow CPUs to dedicate more space to neural co-processors… as the mobile/desktop ARM designs are doing.

jb747 · April 14, 2023, 5:05pm

if ChatGPT is not hosted on the IC, then it seems that a mechanism like https outcalls would be used. If so, my understanding, from Internet Computer Loading, is that multiple requests would be sent to ChatGPT to satisfy a request from a canister and the results must satisfy a consensus check before they affect the state of the canister. ChatGPT intentionally randomizes its output to increase the appeal of the text it generates. Would this require a custom consensus check? Perhaps there are alternatives… just curious at this point.

Gamris · April 15, 2023, 2:49pm

Vicuna (13B parameters, Llama based model) was finetuned for a low cost. It is possible to finetune or LoRA ICP and Motoko documentation then quantize to 4-bits to shrink the model down to a canister-friendly size.

As for inferencing, CMU utilized webGPU and wasm to run Vicuna 4bit over the web. A 10Gb+ VRAM GPU can run 13B-Q4 Vicuna locally, so I’m excited if WebLLM can do the same. GitHub - mlc-ai/web-llm: Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.

cymqqqq · April 15, 2023, 4:01pm

If you want to inference it on IC, the challenge is how to generate candid file.

superduper · April 16, 2023, 10:33pm

lol… ask permission? scrutiny of who 80 year olds who have 0 clue? more like do it and ask for forgiveness later. it’s 1st come 1st serve in nascent markets… we want this stuff running on the IC and controlled by the NNS (has a nice safe guardrails spin to it lol)

superduper · April 16, 2023, 10:35pm

who talks about mandating? why isn’t it possible to have various h/w subnets competing for rewards… instead of just going with some plain vanilla stuff… competition is always good esp when there is subsidies… otherwise some other networks will eat our lunch and we are inflation the ICP supply for less value then we extract out of these nodes existing (bad business imho)

superduper · April 16, 2023, 10:38pm

this is rather unfortunate… we should want these billions of fiats flowing into this very expensive activity instead of subsidizing a bunch of nodes burning up electricity for nada they can start earning money for actually providing computation

kfsibhcmsztjbc13ygtm

Severin · April 17, 2023, 8:01am

There’s a concept called idempotent requests, where multiple calls to the same API will produce the same result. So you’d have to have some mechanism to use the same randomness for multiple calls, otherwise there’s no way to achieve consensus between the nodes

The IC’s consensus does not use competition between nodes like PoW or PoS. And subnets are independent from each other in the sense that they don’t work on the same data/requests, so there’s nothing to compete over either

Gamris · April 17, 2023, 11:38am

Does it help that LLaMa inferencing is ported to rust and can run off the CPU? GitHub - rustformers/llama-rs: Run LLaMA inference on CPU, with Rust 🦀🚀🦙

Dolly 2.0 by Databricks is fully open. So you’re not restricted like LLaMa to research purposes only when running that LLM.

cymqqqq · April 18, 2023, 3:54am

Hi Gamris, I have read the code of this github.
But, unfortunately, it can not be compiled and deployed on IC now, because it has rand library and C library.
Maybe we can rewrite it in pure rust, and try some ways to deploy it on IC.

abc · April 18, 2023, 5:55am

Just found https://edgematrix.pro/
Looking forward to being helpful to you

superduper · April 18, 2023, 11:11pm

that should be changed then. the IC shouldn’t become some sort of node welfare state… distributing ICP to nodes that ain’t doing nuthin’ or providing much less marginal value vs potentially other new nodes/subnet.

we are burning money and getting what for it?

I urge you Dom to move us away from a socialistic welfare state funding model to a free-market model as possible.

peterparker · April 23, 2023, 11:10am

Just found a port of llama.cpp in … Rust. Has anyone tried to put that in a canister (if that’s even possible)?

cymqqqq · April 23, 2023, 11:18am

Yep, I tried and it failed:(

Gamris · April 24, 2023, 9:03am

I found a GitHub for Rust bindings for Tensorflow and one for Pytorch Rust Bindings. Maybe there’s hope to run a small HuggingFace model now?

Topic		Replies	Views
Training AI with ICP General	3	1175	November 12, 2024
Is ic-onchain-node got GPU to training and inference for deAI? if not ,any plan? General Discussing	4	455	April 12, 2024
Is rust the better choice for canister development? Developers	4	618	January 17, 2022
Does rust-connect-py-ai-to-ic have a tokenizer? Developers	9	70	January 10, 2025
Now General LLM is running on a canister!🚀 Developers	2	817	October 20, 2023

AI and machine learning on the IC?

Related topics