Technical Working Group DeAI

icarus · April 18, 2024, 9:00am

That is very cool and fully in the wheel-house of the DeAI working group… I know several people who will be excited to try this and see it working. @patnorris
For context to all following this WG thread, here is a link to the docs in the tract github repo
Intro to Tract
and the TLDR on it is

tract is a neural network inference library. It takes trained networks from higher-level frameworks (Tensorflow, PyTorch, etc.), converts them to an intermediate representation and runs them on the end-user data. It is designed to be very portable and embedding friendly. We believe in running Neural Network Inference on the Edge, on a browser or a small embeddable CPU.

tract-onnx is a Rust library that can load and run an ONNX network. About 85% of ONNX operators are supported.

tract-tensorflow is a Rust library that can load and run a TensorFlow 1 network. Because of the huge size of TensorFlow, a smaller portion of the operator set is supported.

tract-nnef is a Rust lbrary that can load and run NNEF networks. Most of NNEF is supported (missing deconv, ROI operations and quantization).

tract is the main command line interface (can be installed with “cargo install”). It can load network in any of the previously listed formats, dump them in a user friendly form, bench and profile a network. Additionaly, the tract command line can be used to convert a network to NNEF (with some extensions). tract-nnef is significanly smaller and lighter to start than tract-onnx or tract-tensorflow, so this conversion is useful for embedded situations.

patnorris · April 18, 2024, 10:13am

Thank you, this work looks exciting!

And thanks for the docs @icarus As you said, I agree that the group would surely be interested in seeing this in action and learning more about it.
If you’re interested and available @ulan , it’d be great to have you on one of our group calls and show your work

ulan · April 18, 2024, 10:20am

Thanks for the invite @patnorris! I will join the session that’s scheduled today.

patnorris · April 18, 2024, 10:56am

Awesome, looking forward to it

jeshli · April 18, 2024, 1:39pm

I’m fond of your approach. I am happy to have it as a resource.

q2333gh · April 18, 2024, 3:28pm

Im using gpt for 1 hour add up . every day.

After i learn the course : Generative AI for Everyone - DeepLearning.AI

I find that AI is so useful in many aspect of life.

And i also like blockchain with AI. that will be even cooler !

patnorris · April 18, 2024, 3:57pm

agreed, it’ll be amazing to see all the ways that AI can be useful and important to us!

If you like to join our conversations, please join us in our IC Discord channel: ICP Developer Community

And on Thursdays, we have calls in the IC Discord voice channel if you like to join there too.

patnorris · April 18, 2024, 7:41pm

Hi everyone, thank you for today’s call (2024.04.18). This is the generated summary (short version, please find the long version here):
In today’s DeAI Working Group meeting for the Internet Computer, the discussions primarily revolved around optimizing WebAssembly and AI deployment on the platform. Ulan shared insights on using the Tract framework and the community project wasi-to-ic, as detailed on the forum, to adapt code for Internet Computer’s environment. The group delved into technical strategies for overcoming challenges with non-deterministic behaviors in floating-point operations and explored enhancements through SIMD instructions for better performance, potentially achieving up to a tenfold improvement.

A significant part of the conversation addressed the need for higher instruction limits in query operations to efficiently run large language models (LLMs) and discussed architectural approaches for scaling AI deployments via multiple canisters managed by a control canister. This architecture was part of a hackathon project.

The session highlighted ongoing efforts to refine benchmarks and encouraged further discussion on leveraging new optimizations to boost the performance and scalability of AI models on the platform. The participants expressed interest in continuing to explore these areas, particularly in how they could enhance the deployment and usability of LLMs in different scenarios, including on devices through browsers, which presents unique challenges such as initial download times and computational limits based on device capabilities.

patnorris · April 18, 2024, 7:58pm

Hi @ulan , thank you for joining today’s call, it was great hearing about your work.

As discussed, this is the call summary from two weeks ago where @icarus shared a lot of insights from his research on WebAssembly SIMD.

This is the link to @icpp 's repo of the on-chain LLM that we also use as part of the hackathon project I demoed. If you have any questions around this @icpp and I are happy to help.

And for the mentioned benchmarking standard, this is a first simple draft for the template. Happy to hear everyone’s feedback and ideas

q2333gh · April 19, 2024, 1:30am

llama3-8B is released. got performance equally to llama2-70B.What a great things happening!

But i guess its still not quite good to run it on ic canister.

icpp · April 19, 2024, 2:07am

I think running an 8B model is not too far away in the future

ulan · April 19, 2024, 6:45am

Thanks, @patnorris!

And for the mentioned benchmarking standard, this is a first simple draft for the template . Happy to hear everyone’s feedback and ideas

I left a comment about using the same execution/testing environment (e.g. PocketIC) and providing a script to run the experiment.

For the Wasm benchmarking suite I had the following in mind:

it should be language agnostic: Rust, Motoko, Azle, Kybra, C++
it should have a script to build from the source code and produce a single canister.
the canister should have a single public endpoint run() that executes the actual experiment.
the canister can use canister_init() to initialize its data structures for the experiment.

ulan · April 19, 2024, 9:35am

@jeshli: in case you are planning to use wasi2ic in production, please take a look at this change in the example code: fix: Make WASI polyfill more robust in image classification by ulan · Pull Request #850 · dfinity/examples · GitHub

By default the WASI polyfill library is going to use the canister’s stable memory to store the file system. If the canister already uses its stable memory for other purposes this may lead to data corruption. The change I linked passes a virtual stable memory to WASI.

@sgaflv: wdyt about applying the same change to demo1 example of wasi2ic such that people use the library safely?

sgaflv · April 19, 2024, 8:58pm

I think the demo1 example, which is the “Hello World” example, needs to be as simple as possible. While the custom memory example should be shown in a separate non-trivial demonstration.

ulan · April 22, 2024, 8:49am

I think the demo1 example, which is the “Hello World” example, needs to be as simple as possible.

Generally, I agree that examples should be simple, but demo1 right now has a dangerous code pattern.

People usually copy/paste code snippets from examples (like I did for the image classification demo). If they put the code from demo1 into their existing canister, then they are going to lose all their data in stable memory. So demo1 at least needs a comment to explain this risk or should use a safe pattern.

icme · April 28, 2024, 7:21pm

Saw this paper that recently came out on using Six-bit quantization (FP6) to reduce the size of large language models and increase inference throughput on A100 GPUs.

Some of it is a bit over my head, but figured it could be worth looking into.

TLDR Tweet

Research Paper

icarus · April 30, 2024, 10:12am

Hi @ulan , during the DeAI Working Group meeting last week I mentioned during our discussion a WASM proposal for extending SIMD support to include vectors longer than 128-bits. It was called “flexible-vectors” and takes the approach of adding instruction bytecodes that don’t specify a vector instruction bit-width; instead the instructions are specific to the SIMD lane width and count leaving the vector width as a runtime (startup) configuration to be read from the CPU architecture (which would be 512 bits wide for AVX-512 SIMD extensions on the AMD EPYC CPU).

I searched for but couldn’t find any evidence of an implementation in any WASM runtimes. The proposal is fully specified originally by an Intel employee, so possibly there is an unpublicised test implementation somewhere.

Linked at the end of our DeAI WG Discord channel meeting discussion thread here (@patnorris had a question for you there too) : Discord

ulan · April 30, 2024, 10:28am

Thanks @icarus. There is more info in the issue tracker of that repository: Flexible vectors: Tracking issue for feedback after CG presentation · Issue #60 · WebAssembly/flexible-vectors · GitHub

Looks like it was presented last year, but the poll to move to phase 2 “was inconclusive”. I’ll take a look.

BTW, I will miss the upcoming session this Thursday because I will be on holidays.

sgaflv · May 6, 2024, 2:02pm

Alright, I’ve opened the issue in demo1 repo

Samer · May 21, 2024, 2:09pm

Just want to throw this in here. Not an expert, but from what I gather KANs may perform better than Multi-Layer Perceptrons.

It goes without saying that this working group should anticipate the Neural Networks of tomorrow and invest resources in anticipation of latest trends.

Topic		Replies	Views
AI and machine learning on the IC? Developers	114	10399	June 20, 2024
DeAI.chat – Decentralized AI chat on the Internet Computer Showcase DeAI	0	142	February 25, 2025
Llama2.c LLM running in a canister! Programs & Applications	61	4940	July 1, 2024
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	180	10361	October 16, 2025
Introducing the LLM Canister: Deploy AI agents with a few lines of code Developers rust , DeAI	76	4576	September 1, 2025

Technical Working Group DeAI

Related topics