Technical Working Group DeAI

Yes I can potentially talk about some progress on supporting LLMs on the IC :slight_smile:

2 Likes

Sounds great, looking forward to it :+1:

Hi everyone, thank you for today’s call (2025.02.20). Special thanks to @ielashi for sharing the LLM Canister update. This is the generated summary (short version, please find the long version here ): The ICP LLM Canister was introduced as a new LLM service on the Internet Computer, currently supporting Llama 3.1 (8B parameters) with Rust and Motoko libraries. While the AI worker is centralized for now, decentralization is a key future goal. The system is stateless, using IC’s random beacon for variability in responses, and scalability improvements are being explored. Future plans include expanding language support, integrating Anthropic’s MCP for tool calling, and enhancing security and privacy. At ETH Denver, ICP teams will showcase live AI demos, engage in hackathons, and explore cross-chain AI collaborations.

Links shared during the call:

3 Likes

We’ll be presenting it during the upcoming DeAi Working Group session on 27 Feb, '25 at 17:00 UTC.

3 Likes

Awesome, looking forward to it!

Our agenda this Thursday will be your presentation on HyperLaunch and the first Proof-of-AI-Work demo. See you all then :+1:

1 Like

Hi everyone, thank you for today’s call (2025.02.27). Special thanks to the ELNA team for demoing! This is the generated summary (short version, please find the long version here ): We had two demos during today’s DeAI call: HyperLaunch and Proof-of-AI-Work. HyperLaunch enables the tokenization of AI agents, providing a revenue model through governance and decentralized ownership. In the live demo, the ELNA team showcased the tokenization of the Dom3PO AI agent, with future plans for an autonomous podcast agent. onicai introduced Proof-of-AI-Work (PoAIW), a decentralized method inspired by blockchain to verify AI contributions, where AI miners compete and a judge model ranks responses. A fully on-chain AI execution on ICP was demonstrated. The discussion covered governance challenges, ensuring fairness in AI competitions, and balancing model performance with decentralized decision-making.

Links shared during the call:

Hi everyone, this week we’ll have a review of ETH Denver, so it’d be great for all of us who were there to bring their impressions and observations to the call to share them with the group :+1:
Are there any other items you’d like to see on this week’s agenda or is there anything you’d like to share with/show to the group?

Hi everyone, thank you for today’s call (2025.03.06). Special thanks to @jennifertran , @icpp and @apotheosis for sharing their ETH Denver experiences! This is the generated summary (short version, please find the long version here ): The discussion covered a review of ETH Denver. The ICP team engaged in AI and Bitcoin-related networking, highlighting strong developer interest in ZKPs, TEEs, and on-chain AI verification. AI agent discussions were mostly future-oriented, with Eliza as a key framework, while the Bitcoin community showed increasing interest in AI integrations. ICP submissions at the hackathon doubled, with the “Only Possible on ICP” track seeing the most engagement. Looking ahead, an ICP AI hackathon is being considered to better showcase its unique capabilities in the Web3 AI space.

Links shared during the call:

1 Like

Hi everyone, this week @icarus will lead our next AI4AI session where we’ll focus on hardware and infrastructure to support AI on ICP. Looking forward to it and see you all there :+1:

1 Like

Focus of this week’s AI4AI DeAI meeting will be on accelerator card hardware options (GPU/APU/NPU/TPU) and evaluation of their respective benefits for the future of AI enabled subnets

1 Like

This is great, and would be a good segway to discuss some of the decentralization options we have here. I’ll share more about those in the meeting on Thursday.

2 Likes

Also, if I may, I’d like to bring this poll to the attention of this group, as I’d love to get your input on that topic.

1 Like

Hi everyone, thank you for today’s call (2025.03.13). Special thanks to @icarus for leading the call! This is the generated summary (short version, please find the long version here ): In today’s DeAI Working Group call for the Internet Computer, ETH students introduced an inference engine project using a 1B-parameter Llama 3 model, exploring optimizations via the Mistral RS library and considering alternatives like Candle and Llama.cpp. The group focused on discussions on upgrading hardware to Gen-3 AMD EPYC Zen 5 CPUs and integrating GPUs, highlighting NVIDIA’s H100/H200, AMD Instinct, and emerging accelerators such as Tenstorrent to meet ICP’s future AI workload requirements.

Links shared during the call:

1 Like

Hi everyone, I’m excited that the DeAI Chat team (DeAI.chat – Decentralized AI chat on the Internet Computer) will present their work during this week’s call :muscle: thank you @jennifertran for organizing! In addition, we’ll discuss the decentralization options for the AI Worker and the recent poll on this by @ielashi :thumbsup: See you all at 6pm CET on Thursday, I’m looking forward to it!

Hi everyone, thank you for today’s call (2025.03.20). Special thanks to @pu0238 for demoing, @jennifertran for organizing and @ielashi for leading the second part of the call! This is the generated summary (short version, please find the long version here ): Today’s DeAI Working Group call for the Internet Computer discussed recent advancements in efficiently running Large Language Models (LLMs) on-chain. The DeAI Chat team introduced a novel “chunking” strategy that sequentially processes small vector chunks, significantly reducing memory usage and enabling models with up to 1.5 billion parameters to run effectively. They demonstrated both an off-chain client method, optimizing speed through query calls, and an on-chain client method emphasizing decentralization via update calls, albeit with slower responses. Additionally, strategic paths for decentralizing AI worker nodes were explored: the “Classic” model, which aligns with current security but faces cost and scalability limitations, and the “Badlands” approach, which supports open participation and scalability but requires addressing security and decentralization concerns.

Links shared during the call:

Hi everyone, this Thursday we’ll continue our discussion on decentralization options for the AI worker :thumbsup: are there additional topics and items you’d like to add to the agenda? Would anyone like to demo/share their current work? Have a great week

HI @patnorris Greg Seale here - also would like to be part of this group.
I have some experience with AI and am interested in how we can connect to and train / coach the Llama 3.1 LLM that is attached to the IC.

1 Like

Hi @Anypoint , thanks for your message and interest, that sounds great :+1:
The calls take place each Thursday at 6pm CET in the ICP Discord. This is tomorrow’s event, you can see the time in your local timezone and the link to the voice channel where we’ll have the call: ICP

See you then

Hi everyone, thank you for today’s call (2025.03.27). This is the generated summary (short version, please find the long version here ): GPU integration remains challenging due to WASM compatibility, determinism, and VRAM limitations, making large-model GPU deployments inefficient. The current AI Worker is managed off-chain by DFINITY, communicating via polling, with no immediate scalability issues. To decentralize AI workers, two main strategies—standardized hardware setups (“Classic”) and flexible hardware models with reward incentives (“Badlands”)—are under discussion. A hybrid approach, gradually opening governance to additional contributors, is proposed as a practical compromise. Interest is growing around affordable, powerful hardware solutions like NVIDIA DGX Spark (~$4000). Next steps involve exploring these hybrid models, organizing open forums with ICP node providers in Q2 2025, and closely monitoring hardware advancements.

Links shared during the call:

1 Like

Hey,
I was planning to join the call but something personal came up. Still, I’d love to be part of this moving forward.

Just to share—my team and I recently developed a custom algorithm that allows us to run large models like Llama 70B on more accessible hardware. We’re using an L40S 40GB x8 cluster. This isn’t a plug-and-play setup—there’s a lot of work behind optimizing it—and while I can’t share the full details due to production use (we’re serving clients globally and I’m also an NP for ICP), here’s a rough idea of what we’ve achieved through 70 Billion parameter model with Bfloat16:

Setup Token Throughput (tk/s) Avg completion per Request
Standard setup ~30 tk/s ~10 seconds
With extensive tuning ~50–60 tk/s ~5 seconds
Custom algo developed by us ~100–120 tk/s ~2 seconds

This setup is actively running in production, and it proves that large models like 70B can be served efficiently—without jumping to H200-class hardware.

I’m very interested in supporting the ICP AI initiative as a provider. I think there’s a solid opportunity here for a hybrid model: on-chain capabilities supported by providers like us who can deliver reliable inference with real performance.

Happy to discuss more on how we can plug into this effort.

3 Likes