Technical Working Group DeAI

ielashi · February 19, 2025, 8:27pm

Yes I can potentially talk about some progress on supporting LLMs on the IC

patnorris · February 19, 2025, 8:50pm

Sounds great, looking forward to it

patnorris · February 20, 2025, 6:14pm

Hi everyone, thank you for today’s call (2025.02.20). Special thanks to @ielashi for sharing the LLM Canister update. This is the generated summary (short version, please find the long version here ): The ICP LLM Canister was introduced as a new LLM service on the Internet Computer, currently supporting Llama 3.1 (8B parameters) with Rust and Motoko libraries. While the AI worker is centralized for now, decentralization is a key future goal. The system is stateless, using IC’s random beacon for variability in responses, and scalability improvements are being explored. Future plans include expanding language support, integrating Anthropic’s MCP for tool calling, and enhancing security and privacy. At ETH Denver, ICP teams will showcase live AI demos, engage in hackathons, and explore cross-chain AI collaborations.

Links shared during the call:

New ICP LLM Canister: Introducing the LLM Canister: Deploy AI agents with a few lines of code
MCP standard: Introduction - Model Context Protocol
vLLM is the current industry standard for serving batch requests: Welcome to vLLM — vLLM
ETH Denver ICP Events: ICP Events · Events Calendar
Agents Day at ETH Denver: Agents Day - AI x Web3 | 🇺🇸 Denver 2025 · Luma
Another Agents event: Agents Unleashed: Builders Night @ETHDenver · Luma
ICP Telegram group for ETH Denver: Telegram: Join Group Chat
ICP Projects at ETH Denver: ICP @ ETH Denver 2025 - Projects View - Google Sheets
Outlier Ventures event at ETH Denver on the Post Web: Outlier Ventures' Open House presents The Post Web | AI x Web3 · Luma
Environment for creating Fetch.AI agents: GitHub - JupiterM/Fetch-Ai-Vagrant: Code Repository for Vagrant box which install development environment for creating Fetch.AI agents.

branbuilder · February 24, 2025, 6:54pm

We’ll be presenting it during the upcoming DeAi Working Group session on 27 Feb, '25 at 17:00 UTC.

patnorris · February 25, 2025, 11:16am

Awesome, looking forward to it!

Our agenda this Thursday will be your presentation on HyperLaunch and the first Proof-of-AI-Work demo. See you all then

patnorris · February 27, 2025, 6:34pm

Hi everyone, thank you for today’s call (2025.02.27). Special thanks to the ELNA team for demoing! This is the generated summary (short version, please find the long version here ): We had two demos during today’s DeAI call: HyperLaunch and Proof-of-AI-Work. HyperLaunch enables the tokenization of AI agents, providing a revenue model through governance and decentralized ownership. In the live demo, the ELNA team showcased the tokenization of the Dom3PO AI agent, with future plans for an autonomous podcast agent. onicai introduced Proof-of-AI-Work (PoAIW), a decentralized method inspired by blockchain to verify AI contributions, where AI miners compete and a judge model ranks responses. A fully on-chain AI execution on ICP was demonstrated. The discussion covered governance challenges, ensuring fairness in AI competitions, and balancing model performance with decentralized decision-making.

Links shared during the call:

https://ethdenver2025.devfolio.co/prizes?partner=Internet+Computer
onicai - Artificial Intelligence as-a-Service
[2012.07805] Extracting Training Data from Large Language Models
https://hyperlaunch.fun/
https://dapp.elna.ai/
https://app.sonic.ooo/
On-chain model ELNA is using for the embedding: GitHub - elna-ai/ic-embedding: Vector Embedding inference engine in IC
ELNA · GitHub
https://hyperlaunch.fun/pool/ixrsh-yiaaa-aaaak-qubcq-cai
Created agent during the HyperLaunch demo: https://x.com/Dom3PO

patnorris · March 3, 2025, 8:48am

Hi everyone, this week we’ll have a review of ETH Denver, so it’d be great for all of us who were there to bring their impressions and observations to the call to share them with the group
Are there any other items you’d like to see on this week’s agenda or is there anything you’d like to share with/show to the group?

patnorris · March 6, 2025, 6:22pm

Hi everyone, thank you for today’s call (2025.03.06). Special thanks to @jennifertran , @icpp and @apotheosis for sharing their ETH Denver experiences! This is the generated summary (short version, please find the long version here ): The discussion covered a review of ETH Denver. The ICP team engaged in AI and Bitcoin-related networking, highlighting strong developer interest in ZKPs, TEEs, and on-chain AI verification. AI agent discussions were mostly future-oriented, with Eliza as a key framework, while the Bitcoin community showed increasing interest in AI integrations. ICP submissions at the hackathon doubled, with the “Only Possible on ICP” track seeing the most engagement. Looking ahead, an ICP AI hackathon is being considered to better showcase its unique capabilities in the Web3 AI space.

Links shared during the call:

Vitalik Buterin on AI: AI as the engine, humans as the steering wheel
https://x.com/dfinity/status/1895505980864987389
GitHub - ICME-Lab/zkEngine_dev: A cutting-edge zkWASM implementation leveraging Nova-NIVC-based folding techniques.
GitHub - dmlc/xgboost: Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Kinic PoC with: https://github.com/jinlow/forust/tree/main
ONYX to ZKP but using Halo2 proving scheme which is dated:
GitHub - zkonduit/ezkl: ezkl is an engine for doing inference for deep learning models and other computational graphs in a zk-snark (ZKML). Use it from Python, Javascript, or the command line.
ICP booth at ETH Denver: https://x.com/Real0xJason/status/1896037515771199535
OpenChat Botathon: OpenChat

patnorris · March 10, 2025, 12:26pm

Hi everyone, this week @icarus will lead our next AI4AI session where we’ll focus on hardware and infrastructure to support AI on ICP. Looking forward to it and see you all there

icarus · March 10, 2025, 11:58pm

Focus of this week’s AI4AI DeAI meeting will be on accelerator card hardware options (GPU/APU/NPU/TPU) and evaluation of their respective benefits for the future of AI enabled subnets

ielashi · March 11, 2025, 11:05am

This is great, and would be a good segway to discuss some of the decentralization options we have here. I’ll share more about those in the meeting on Thursday.

ielashi · March 11, 2025, 5:10pm

Also, if I may, I’d like to bring this poll to the attention of this group, as I’d love to get your input on that topic.

patnorris · March 13, 2025, 8:47pm

Hi everyone, thank you for today’s call (2025.03.13). Special thanks to @icarus for leading the call! This is the generated summary (short version, please find the long version here ): In today’s DeAI Working Group call for the Internet Computer, ETH students introduced an inference engine project using a 1B-parameter Llama 3 model, exploring optimizations via the Mistral RS library and considering alternatives like Candle and Llama.cpp. The group focused on discussions on upgrading hardware to Gen-3 AMD EPYC Zen 5 CPUs and integrating GPUs, highlighting NVIDIA’s H100/H200, AMD Instinct, and emerging accelerators such as Tenstorrent to meet ICP’s future AI workload requirements.

Links shared during the call:

GitHub - EricLBuehler/mistral.rs: Blazingly fast LLM inference.
original llama.cpp: GitHub - ggml-org/llama.cpp: LLM inference in C/C++
llama.cpp running in a canister of the IC: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
summary of the max tokens per update call investigation: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
summary of the last hardware-focused call as a reference: DeAIWorkingGroupInternetComputer/WorkingGroupMeetings/2025.02.13 at main · DeAIWorkingGroupInternetComputer/DeAIWorkingGroupInternetComputer · GitHub
NVIDIA Data Center GPU Resource Center
example server: MiTAC TN85B8261 B8261T85E8HR-2T-N Overview
https://www.gigabyte.com/Enterprise/Rack-Server/XV23-ZX0-AAJ1-rev-3x#Overview
NVIDIA H100 Tensor Core GPU
https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html
Wormhole™
https://www.modular.com/
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
Decreasing HTTP Outcall Latency and Cost - #7 by lastmjs

patnorris · March 18, 2025, 4:32pm

Hi everyone, I’m excited that the DeAI Chat team (DeAI.chat – Decentralized AI chat on the Internet Computer) will present their work during this week’s call thank you @jennifertran for organizing! In addition, we’ll discuss the decentralization options for the AI Worker and the recent poll on this by @ielashi See you all at 6pm CET on Thursday, I’m looking forward to it!

patnorris · March 20, 2025, 6:23pm

Hi everyone, thank you for today’s call (2025.03.20). Special thanks to @pu0238 for demoing, @jennifertran for organizing and @ielashi for leading the second part of the call! This is the generated summary (short version, please find the long version here ): Today’s DeAI Working Group call for the Internet Computer discussed recent advancements in efficiently running Large Language Models (LLMs) on-chain. The DeAI Chat team introduced a novel “chunking” strategy that sequentially processes small vector chunks, significantly reducing memory usage and enabling models with up to 1.5 billion parameters to run effectively. They demonstrated both an off-chain client method, optimizing speed through query calls, and an on-chain client method emphasizing decentralization via update calls, albeit with slower responses. Additionally, strategic paths for decentralizing AI worker nodes were explored: the “Classic” model, which aligns with current security but faces cost and scalability limitations, and the “Badlands” approach, which supports open participation and scalability but requires addressing security and decentralization concerns.

Links shared during the call:

patnorris · March 24, 2025, 11:09am

Hi everyone, this Thursday we’ll continue our discussion on decentralization options for the AI worker are there additional topics and items you’d like to add to the agenda? Would anyone like to demo/share their current work? Have a great week

Anypoint · March 26, 2025, 4:28pm

HI @patnorris Greg Seale here - also would like to be part of this group.
I have some experience with AI and am interested in how we can connect to and train / coach the Llama 3.1 LLM that is attached to the IC.

patnorris · March 26, 2025, 6:18pm

Hi @Anypoint , thanks for your message and interest, that sounds great
The calls take place each Thursday at 6pm CET in the ICP Discord. This is tomorrow’s event, you can see the time in your local timezone and the link to the voice channel where we’ll have the call: ICP

See you then

patnorris · March 27, 2025, 6:23pm

Hi everyone, thank you for today’s call (2025.03.27). This is the generated summary (short version, please find the long version here ): GPU integration remains challenging due to WASM compatibility, determinism, and VRAM limitations, making large-model GPU deployments inefficient. The current AI Worker is managed off-chain by DFINITY, communicating via polling, with no immediate scalability issues. To decentralize AI workers, two main strategies—standardized hardware setups (“Classic”) and flexible hardware models with reward incentives (“Badlands”)—are under discussion. A hybrid approach, gradually opening governance to additional contributors, is proposed as a practical compromise. Interest is growing around affordable, powerful hardware solutions like NVIDIA DGX Spark (~$4000). Next steps involve exploring these hybrid models, organizing open forums with ICP node providers in Q2 2025, and closely monitoring hardware advancements.

Links shared during the call:

MalithHatananchchige · March 28, 2025, 4:29am

Hey,
I was planning to join the call but something personal came up. Still, I’d love to be part of this moving forward.

Just to share—my team and I recently developed a custom algorithm that allows us to run large models like Llama 70B on more accessible hardware. We’re using an L40S 40GB x8 cluster. This isn’t a plug-and-play setup—there’s a lot of work behind optimizing it—and while I can’t share the full details due to production use (we’re serving clients globally and I’m also an NP for ICP), here’s a rough idea of what we’ve achieved through 70 Billion parameter model with Bfloat16:

Setup	Token Throughput (tk/s)	Avg completion per Request
Standard setup	~30 tk/s	~10 seconds
With extensive tuning	~50–60 tk/s	~5 seconds
Custom algo developed by us	~100–120 tk/s	~2 seconds

This setup is actively running in production, and it proves that large models like 70B can be served efficiently—without jumping to H200-class hardware.

I’m very interested in supporting the ICP AI initiative as a provider. I think there’s a solid opportunity here for a hybrid model: on-chain capabilities supported by providers like us who can deliver reliable inference with real performance.

Happy to discuss more on how we can plug into this effort.

Topic		Replies	Views
Come hear about the state of the ART on ZKML. *ICP is the global orchestration layer for DeAI Showcase	16	228	May 8, 2025
Announcing Technical Working Groups Developers	38	25977	July 25, 2024
DeAI Marketing Site and Campaign Survey: Q2 2024 Developers	0	120	April 15, 2024
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	170	9884	May 15, 2025
Working with Decentralized LLM Developers	2	521	January 14, 2024

Technical Working Group DeAI

Links shared during the call:

Links shared during the call:

Links shared during the call:

Links shared during the call:

Links shared during the call:

Links shared during the call:

Related topics