Yes I can potentially talk about some progress on supporting LLMs on the IC
Sounds great, looking forward to it
Hi everyone, thank you for today’s call (2025.02.20). Special thanks to @ielashi for sharing the LLM Canister update. This is the generated summary (short version, please find the long version here ): The ICP LLM Canister was introduced as a new LLM service on the Internet Computer, currently supporting Llama 3.1 (8B parameters) with Rust and Motoko libraries. While the AI worker is centralized for now, decentralization is a key future goal. The system is stateless, using IC’s random beacon for variability in responses, and scalability improvements are being explored. Future plans include expanding language support, integrating Anthropic’s MCP for tool calling, and enhancing security and privacy. At ETH Denver, ICP teams will showcase live AI demos, engage in hackathons, and explore cross-chain AI collaborations.
Links shared during the call:
- New ICP LLM Canister: Introducing the LLM Canister: Deploy AI agents with a few lines of code
- MCP standard: Introduction - Model Context Protocol
- vLLM is the current industry standard for serving batch requests: Welcome to vLLM — vLLM
- ETH Denver ICP Events: ICP Events · Events Calendar
- Agents Day at ETH Denver: Agents Day - AI x Web3 | 🇺🇸 Denver 2025 · Luma
- Another Agents event: Agents Unleashed: Builders Night @ETHDenver · Luma
- ICP Telegram group for ETH Denver: Telegram: Join Group Chat
- ICP Projects at ETH Denver: ICP @ ETH Denver 2025 - Projects View - Google Sheets
- Outlier Ventures event at ETH Denver on the Post Web: Outlier Ventures' Open House presents The Post Web | AI x Web3 · Luma
- Environment for creating Fetch.AI agents: GitHub - JupiterM/Fetch-Ai-Vagrant: Code Repository for Vagrant box which install development environment for creating Fetch.AI agents.
We’ll be presenting it during the upcoming DeAi Working Group session on 27 Feb, '25 at 17:00 UTC.
Awesome, looking forward to it!
Our agenda this Thursday will be your presentation on HyperLaunch and the first Proof-of-AI-Work demo. See you all then
Hi everyone, thank you for today’s call (2025.02.27). Special thanks to the ELNA team for demoing! This is the generated summary (short version, please find the long version here ): We had two demos during today’s DeAI call: HyperLaunch and Proof-of-AI-Work. HyperLaunch enables the tokenization of AI agents, providing a revenue model through governance and decentralized ownership. In the live demo, the ELNA team showcased the tokenization of the Dom3PO AI agent, with future plans for an autonomous podcast agent. onicai introduced Proof-of-AI-Work (PoAIW), a decentralized method inspired by blockchain to verify AI contributions, where AI miners compete and a judge model ranks responses. A fully on-chain AI execution on ICP was demonstrated. The discussion covered governance challenges, ensuring fairness in AI competitions, and balancing model performance with decentralized decision-making.
Links shared during the call:
- https://ethdenver2025.devfolio.co/prizes?partner=Internet+Computer
- onicai - Artificial Intelligence as-a-Service
- [2012.07805] Extracting Training Data from Large Language Models
- https://hyperlaunch.fun/
- https://dapp.elna.ai/
- https://app.sonic.ooo/
- On-chain model ELNA is using for the embedding: GitHub - elna-ai/ic-embedding: Vector Embedding inference engine in IC
- ELNA · GitHub
- https://hyperlaunch.fun/pool/ixrsh-yiaaa-aaaak-qubcq-cai
- Created agent during the HyperLaunch demo: https://x.com/Dom3PO
Hi everyone, this week we’ll have a review of ETH Denver, so it’d be great for all of us who were there to bring their impressions and observations to the call to share them with the group
Are there any other items you’d like to see on this week’s agenda or is there anything you’d like to share with/show to the group?
Hi everyone, thank you for today’s call (2025.03.06). Special thanks to @jennifertran , @icpp and @apotheosis for sharing their ETH Denver experiences! This is the generated summary (short version, please find the long version here ): The discussion covered a review of ETH Denver. The ICP team engaged in AI and Bitcoin-related networking, highlighting strong developer interest in ZKPs, TEEs, and on-chain AI verification. AI agent discussions were mostly future-oriented, with Eliza as a key framework, while the Bitcoin community showed increasing interest in AI integrations. ICP submissions at the hackathon doubled, with the “Only Possible on ICP” track seeing the most engagement. Looking ahead, an ICP AI hackathon is being considered to better showcase its unique capabilities in the Web3 AI space.
Links shared during the call:
- Vitalik Buterin on AI: AI as the engine, humans as the steering wheel
- https://x.com/dfinity/status/1895505980864987389
- GitHub - ICME-Lab/zkEngine_dev: A cutting-edge zkWASM implementation leveraging Nova-NIVC-based folding techniques.
- GitHub - dmlc/xgboost: Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
- Kinic PoC with: https://github.com/jinlow/forust/tree/main
- ONYX to ZKP but using Halo2 proving scheme which is dated:
- GitHub - zkonduit/ezkl: ezkl is an engine for doing inference for deep learning models and other computational graphs in a zk-snark (ZKML). Use it from Python, Javascript, or the command line.
- ICP booth at ETH Denver: https://x.com/Real0xJason/status/1896037515771199535
- OpenChat Botathon: OpenChat
Hi everyone, this week @icarus will lead our next AI4AI session where we’ll focus on hardware and infrastructure to support AI on ICP. Looking forward to it and see you all there
Focus of this week’s AI4AI DeAI meeting will be on accelerator card hardware options (GPU/APU/NPU/TPU) and evaluation of their respective benefits for the future of AI enabled subnets
This is great, and would be a good segway to discuss some of the decentralization options we have here. I’ll share more about those in the meeting on Thursday.
Also, if I may, I’d like to bring this poll to the attention of this group, as I’d love to get your input on that topic.
Hi everyone, thank you for today’s call (2025.03.13). Special thanks to @icarus for leading the call! This is the generated summary (short version, please find the long version here ): In today’s DeAI Working Group call for the Internet Computer, ETH students introduced an inference engine project using a 1B-parameter Llama 3 model, exploring optimizations via the Mistral RS library and considering alternatives like Candle and Llama.cpp. The group focused on discussions on upgrading hardware to Gen-3 AMD EPYC Zen 5 CPUs and integrating GPUs, highlighting NVIDIA’s H100/H200, AMD Instinct, and emerging accelerators such as Tenstorrent to meet ICP’s future AI workload requirements.
Links shared during the call:
- GitHub - EricLBuehler/mistral.rs: Blazingly fast LLM inference.
- original llama.cpp: GitHub - ggml-org/llama.cpp: LLM inference in C/C++
- llama.cpp running in a canister of the IC: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
- summary of the max tokens per update call investigation: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
- summary of the last hardware-focused call as a reference: DeAIWorkingGroupInternetComputer/WorkingGroupMeetings/2025.02.13 at main · DeAIWorkingGroupInternetComputer/DeAIWorkingGroupInternetComputer · GitHub
- NVIDIA Data Center GPU Resource Center
- example server: MiTAC TN85B8261 B8261T85E8HR-2T-N Overview
- https://www.gigabyte.com/Enterprise/Rack-Server/XV23-ZX0-AAJ1-rev-3x#Overview
- NVIDIA H100 Tensor Core GPU
- https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html
- Wormhole™
- https://www.modular.com/
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
- Decreasing HTTP Outcall Latency and Cost - #7 by lastmjs
Hi everyone, I’m excited that the DeAI Chat team (DeAI.chat – Decentralized AI chat on the Internet Computer) will present their work during this week’s call thank you @jennifertran for organizing! In addition, we’ll discuss the decentralization options for the AI Worker and the recent poll on this by @ielashi
See you all at 6pm CET on Thursday, I’m looking forward to it!
Hi everyone, thank you for today’s call (2025.03.20). Special thanks to @pu0238 for demoing, @jennifertran for organizing and @ielashi for leading the second part of the call! This is the generated summary (short version, please find the long version here ): Today’s DeAI Working Group call for the Internet Computer discussed recent advancements in efficiently running Large Language Models (LLMs) on-chain. The DeAI Chat team introduced a novel “chunking” strategy that sequentially processes small vector chunks, significantly reducing memory usage and enabling models with up to 1.5 billion parameters to run effectively. They demonstrated both an off-chain client method, optimizing speed through query calls, and an on-chain client method emphasizing decentralization via update calls, albeit with slower responses. Additionally, strategic paths for decentralizing AI worker nodes were explored: the “Classic” model, which aligns with current security but faces cost and scalability limitations, and the “Badlands” approach, which supports open participation and scalability but requires addressing security and decentralization concerns.
Links shared during the call:
Hi everyone, this Thursday we’ll continue our discussion on decentralization options for the AI worker are there additional topics and items you’d like to add to the agenda? Would anyone like to demo/share their current work? Have a great week
HI @patnorris Greg Seale here - also would like to be part of this group.
I have some experience with AI and am interested in how we can connect to and train / coach the Llama 3.1 LLM that is attached to the IC.
Hi @Anypoint , thanks for your message and interest, that sounds great
The calls take place each Thursday at 6pm CET in the ICP Discord. This is tomorrow’s event, you can see the time in your local timezone and the link to the voice channel where we’ll have the call: ICP
See you then
Hi everyone, thank you for today’s call (2025.03.27). This is the generated summary (short version, please find the long version here ): GPU integration remains challenging due to WASM compatibility, determinism, and VRAM limitations, making large-model GPU deployments inefficient. The current AI Worker is managed off-chain by DFINITY, communicating via polling, with no immediate scalability issues. To decentralize AI workers, two main strategies—standardized hardware setups (“Classic”) and flexible hardware models with reward incentives (“Badlands”)—are under discussion. A hybrid approach, gradually opening governance to additional contributors, is proposed as a practical compromise. Interest is growing around affordable, powerful hardware solutions like NVIDIA DGX Spark (~$4000). Next steps involve exploring these hybrid models, organizing open forums with ICP node providers in Q2 2025, and closely monitoring hardware advancements.
Links shared during the call:
Hey,
I was planning to join the call but something personal came up. Still, I’d love to be part of this moving forward.
Just to share—my team and I recently developed a custom algorithm that allows us to run large models like Llama 70B on more accessible hardware. We’re using an L40S 40GB x8 cluster. This isn’t a plug-and-play setup—there’s a lot of work behind optimizing it—and while I can’t share the full details due to production use (we’re serving clients globally and I’m also an NP for ICP), here’s a rough idea of what we’ve achieved through 70 Billion parameter model with Bfloat16:
Setup | Token Throughput (tk/s) | Avg completion per Request |
---|---|---|
Standard setup | ~30 tk/s | ~10 seconds |
With extensive tuning | ~50–60 tk/s | ~5 seconds |
Custom algo developed by us | ~100–120 tk/s | ~2 seconds |
This setup is actively running in production, and it proves that large models like 70B can be served efficiently—without jumping to H200-class hardware.
I’m very interested in supporting the ICP AI initiative as a provider. I think there’s a solid opportunity here for a hybrid model: on-chain capabilities supported by providers like us who can deliver reliable inference with real performance.
Happy to discuss more on how we can plug into this effort.