Parallel execution for canisters

ielashi · March 4, 2025, 2:58pm

I did explore optimizing the execution of AI inference, and one optimization that would really make a difference would be support multi-core matrix multiplication, as that’s where 99% of all inference computation lies. Having said that, even with that optimization, don’t expect to be able to run models bigger than ~1B parameters efficiently, as for those we’d really need beefier hardware than the current nodes.

Computation aside, there’s also the I/O problem, so even with beefier hardware that allows us to run bigger models, swapping in/out these models to/from VRAM could be very expensive (a 70B parameter model would need anywhere between 10-20 seconds just to be loaded in VRAM). That’s another challenge that we’d still have even with beefier hardware.

That’s why for now we’re exploring the approach of AI workers as outlined here. With this approach, we’d support a handful of foundational LLMs rather than support anyone running any model. It’s a limitation, but it does allow these workers to have these LLMs consistently loaded in RAM for faster inference.

Are you interested specifically in training?

Topic		Replies	Views
PyPIM (processing in memory) General	1	59	January 18, 2025
AI and machine learning on the IC? Developers	114	10146	June 20, 2024
What Makes AI on Blockchain Hard? [Request for feedback on post] General	24	2370	October 29, 2024
Technical Working Group DeAI Developers	353	14091	September 2, 2025
Near Protocol vs Internet Computer on Ai app creator General	49	991	November 11, 2024

Parallel execution for canisters

Related topics