Technical Working Group DeAI

patnorris · March 13, 2025, 8:47pm

Hi everyone, thank you for today’s call (2025.03.13). Special thanks to @icarus for leading the call! This is the generated summary (short version, please find the long version here ): In today’s DeAI Working Group call for the Internet Computer, ETH students introduced an inference engine project using a 1B-parameter Llama 3 model, exploring optimizations via the Mistral RS library and considering alternatives like Candle and Llama.cpp. The group focused on discussions on upgrading hardware to Gen-3 AMD EPYC Zen 5 CPUs and integrating GPUs, highlighting NVIDIA’s H100/H200, AMD Instinct, and emerging accelerators such as Tenstorrent to meet ICP’s future AI workload requirements.

Links shared during the call:

GitHub - EricLBuehler/mistral.rs: Blazingly fast LLM inference.
original llama.cpp: GitHub - ggml-org/llama.cpp: LLM inference in C/C++
llama.cpp running in a canister of the IC: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
summary of the max tokens per update call investigation: GitHub - onicai/llama_cpp_canister: llama.cpp for the Internet Computer
summary of the last hardware-focused call as a reference: DeAIWorkingGroupInternetComputer/WorkingGroupMeetings/2025.02.13 at main · DeAIWorkingGroupInternetComputer/DeAIWorkingGroupInternetComputer · GitHub
NVIDIA Data Center GPU Resource Center
example server: MiTAC TN85B8261 B8261T85E8HR-2T-N Overview
https://www.gigabyte.com/Enterprise/Rack-Server/XV23-ZX0-AAJ1-rev-3x#Overview
NVIDIA H100 Tensor Core GPU
https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html
Wormhole™
https://www.modular.com/
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
Decreasing HTTP Outcall Latency and Cost - #7 by lastmjs

Topic		Replies	Views
Come hear about the state of the ART on ZKML. *ICP is the global orchestration layer for DeAI Showcase	16	228	May 8, 2025
Announcing Technical Working Groups Developers	38	25989	July 25, 2024
DeAI Marketing Site and Campaign Survey: Q2 2024 Developers	0	120	April 15, 2024
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	170	9885	May 15, 2025
Working with Decentralized LLM Developers	2	521	January 14, 2024

Technical Working Group DeAI

Links shared during the call:

Related topics