Technical Working Group DeAI

patnorris · January 11, 2024, 7:10pm

Hi all, thank you for the call today. This is the generated summary:

Amit from the Elna team introduces his team members and discusses their work on building an AI platform for agents.
Various members, including Alex from the Elna team, introduce themselves and their work, focusing on areas like Natural Language Processing and large language models.
Technical discussions touch on the development of an on- chain vector DB and the use of retrieval-augmented generation for their platform.
The group reflects on previous meetings and ongoing project updates, discussing technical aspects like increasing stable memory and potential GPU canister usage.
Technical challenges related to AI agents and the implementation of large language models are discussed, with emphasis on memory limitations and the need for increased instruction limits.
Participants seek feedback on technical feasibility and anticipate future developments, including increased memory capacities. It is likely that the heap memory can be increased to 16GiB this year (from 4GiB currently).
The conversation shifts to the broader ICP ecosystem, discussing the development and use of AI and machine learning models on-chain.
There’s a debate on charging for query calls, considering the implications for users and potential abuse.
The need for educational resources for both developers and the general public is highlighted, emphasizing the importance of tutorials and learning materials.
The importance of showcasing projects within the ICP ecosystem is discussed, including the idea of a dedicated website for this purpose.
SEO concerns for sites hosted on ICP are mentioned, balancing technical showcases with practical deployment considerations.
The conversation ends with a mention of developing an image recognition algorithm as an application on the ICP.

Next steps:

Please all add your use cases, ideas and motivation to the different feature discussions:
GPU: Community Consideration: GPU Use Cases and Specifications - #5 by cgiteach
Query charging: Community Consideration: Explore Query Charging - #35 by icarus
vetKeys/privacy: Threshold Key Derivation - Privacy on the IC - #143 by patnorris
And let’s brainstorm ideas for how we can provide educational materials for AI on the IC and showcase our projects and initiatives at the same time (please share any ideas here or on Discord: Discord)

Thanks again @jeshli and @icarus for getting these threads going!

Please let us know if you have any ideas, feedback or questions. Looking forward to continuing the conversation!

hokosugi · January 14, 2024, 3:28am

I have been thinking about why it should be DeAI and not AI.
I would like to add that no matter how meaningful and significant it is, it cannot take market share from existing centralized servers if the UX is poor, so I would like to add that, of course, it must clear technical hurdles, obstacles to smooth UX by being web3, and extra hassles.

DeAI will solve the challenges of centralized server-based AI:

Digital sovereignty
Privacy protection
Improved security by eliminating intermediaries
Non-transparent data processing and auditing
Trust in server administrators
Control of the server administrator.

These are much the same concepts as web3, which is trying to solve web2 issues, but they are more serious and must be solved considering the impact of AI on society at large.

What DeAI will bring:

User-managed and driven AI
AI that leverages users’ privacy data
Highly secure Serverless AI services
Transparent data processing and auditing
Power-distributed server management

If there are any points that we have missed or should add, please feel free to point them out in that case.

hokosugi · January 17, 2024, 5:45am

Is this likely to be achieved? My understanding is that to get heap memory above 4GB you need to change from 32-bit wasm to 64-bit wasm. the Dfinity roadmap says “Potential future explorations”, so I don’t think it is in the development stage yet! Or is there another way?

patnorris · January 17, 2024, 1:40pm

Thank you for sharing the link true, it’d be good to know then if the heap increase via 64-bit wasm might have made it to the development or planning stage yet and if we could maybe motivate the development further with our use cases. I’ll put this on the agenda for our call tomorrow

jeshli · January 17, 2024, 3:54pm

According to the Memory64 proposal overview on GitHub, the implementation status for 64-bit WASM in various platforms is as follows:

Spec interpreter: Done
V8/Chrome: Done
Firefox: Done
Safari: Status unknown
WABT: Done
Binaryen: Done
Emscripten: Done

The technical foundation for 64-bit WASM is largely in place, with major browsers and tools already supporting it. Given the current trajectory of WASM-64’s development, it seems that it will become part of the WebAssembly standard soon. As for Dfinity’s ability to incorporate this change, my understanding is that it will not be overly burdensome and would begin immediately after 64-bit WASM is officially released.

hokosugi · January 17, 2024, 9:48pm

Thank you both very much.
I was not aware that the wasm standard had not yet been adopted, and since increasing heap memory is a lifeline for DeAI, I hope it will be a priority on Dfinity’s development roadmap.
I would like to know what Dfinity thinks about the huge potential demand for AI that can be coupled with sovereignty cloud and digital sovereignty, which in turn should be considered a marketing advantage, differentiation from other chains, etc. I would like to know what Dfinity thinks about this.

patnorris · January 18, 2024, 7:16pm

Hi everyone, thank you for today’s discussion. This is the generated summary (call on 2024/01/18):

Meeting Structure and Topics: The group will aim for a structured approach for future meetings, focusing on specific topics each week, starting from basic to more advanced discussions. This structure aims to build a comprehensive knowledge base over time.
Summarizing Meetings: Meeting summaries will be created for community contribution and to aid newcomers. This approach is to document discussions for future reference.
Utilizing DFINITY Wiki and Other Platforms: While considering the DFINITY Wiki for formal documents, the group acknowledged the need for a more dynamic workspace. Discussion on various platforms for content development and hosting took place.
Twitter Spaces for Outreach and Education: The group plans to leverage Twitter spaces for broader outreach and education. Regular Twitter space sessions are to be held for topic discussions, resource sharing, and audience engagement. ELNA will have a Twitter space tomorrow (Jan 19): https://twitter.com/elna_live/status/1747156897273364555
Simplifying Technical Jargon: Emphasis was placed on making complex AI and blockchain concepts accessible to the public. Using analogies, infographics, and videos to simplify terminology was suggested.
Content Creation for Educational Platforms: There was a proposal to write introductory articles on AI and machine learning on the ICP, to serve as starting points for new developers. These articles would be published across various platforms.
GPU Integration in Infrastructure: The group discussed different models for GPU integration in ICP infrastructure. This included full integration into the consensus algorithm and using GPUs for query processing. The potential for off-chain GPU compute services was also explored.
64-bit WebAssembly (WASM) Discussion: The discussion touched upon the benefits and implications of moving to 64-bit WASM, specifically regarding heap memory and algorithmic limits.
Economic Incentives for GPU Hardware: The idea of charging for queries and providing economic incentives for investing in GPU hardware was discussed. This is linked with the faster processing capabilities of GPUs.
Potential NNS Proposal for Query Charging: The group considered submitting a proposal to the Network Nervous System (NNS) regarding query charging, but decided to wait for further information from the DFINITY team.
Feedback Mechanism and Tech Stack Insights: The group showed interest in understanding different projects’ tech stacks and the kind of workloads they intend to run, especially concerning training versus inference.
CUDA Dependencies and Framework Abstraction: A technical discussion around CUDA dependencies and the abstraction provided by frameworks like PyTorch 2.0 and TensorFlow was brought up. The focus was on the ability of these frameworks to run computations on different hardware, like AMD, beyond the standard CUDA library.

Overall, the meeting emphasized structuring future discussions for comprehensive knowledge-building, exploring GPU integration in various models, simplifying AI and blockchain concepts for a broader audience, and probing into the potential of 64-bit WASM and its implications. The group also stressed the importance of creating educational content and exploring economic models for GPU investment.

lastmjs · January 19, 2024, 6:14am

Have you missed 64 bit Wasm heap? Super important feature

patnorris · January 19, 2024, 1:09pm

Agreed, I think this might even be one of the main initiatives we as a group can tackle; to help with 64 bit Wasm heap (if only getting this feature potentially prioritized, maybe more). Would be a really exciting enhancement, most likely also beyond our projects

icpp · January 21, 2024, 5:17pm

@evanmcfarland ,
You mentioned that you like qdrant for your vector database in the RAG.

Have you also tried FAISS?

I am having a very good experience with it, and am considering to try to port it to the IC, because it is a pure C++ library.

@branbuilder ,
What vector database are you porting? Or are you building a new one from scratch?

evanmcfarland · January 23, 2024, 4:59pm

@icpp I’m sure FAISS would be an excellent starting point for many people’s needs, but it does not fit my use case personally.

Since it’s a ‘vector library’ instead of a ‘database,’ it’s not by itself meant to work with changing datasets. In my app, users upload/delete books that get data added/deleted to/from the DB in real time so I have to stick with Qdrant.

The good news for me is that my outbound https requests to VectorDBs are proving affordable and fast enough for decent usability in the interim.

branbuilder · January 24, 2024, 5:41pm

Sorry my previous post was not live due to some network issue.

Yes we are trying to port “qdrant”
meanwhile we are also trying a minimal vector db from our side

@evanmcfarland
FAISS looks promising, will have a try.

Its great if you can port it to the IC

In ELNA platform we are planing to support multiple solutions, will surely try to accommodate the same to ELNA platform ones you have ported it .
please have a look our video explanation about ELNA

jeshli · January 25, 2024, 5:38pm

@mnl this is the fork of Tract Crate that I altered in order to run LLMs within a canister in a local environment. This repo is the IC application of that Crate which that I have used for testing locally and that I am actively improving.

Honestly, this framework would likely be the best: crates.io: Rust Package Registry as it is the most advanced Onnx Runtime Environment and comes with running examples that can easily download tokenizers and models from Hugging Face.

[Edit: repos have been moved to modclub repo umbrella]

mnl · January 25, 2024, 6:18pm

nice!

regarding ort they don’t list wasm-linux as compatible

I wonder how much work it would be to get it to work…

I love they have the feature to be able to run stuff from huggingface directly

jeshli · January 25, 2024, 6:47pm

After I finish my first draft of my demo application I will start examining to see what aspects of their “feature to be able to run stuff from huggingface directly” that I can imitate and integrate into my tract fork. If nothing else, it is promising for achieving that aim with ONNX for either Burn-rs or Tract because there already exists for Onnx Runtime Environment.

mnl · January 25, 2024, 6:57pm

I don’t necessarily think it has to be part of the ML library; the HF model could be converted into ONNX format during build step in dfx.json by a separate tool, eg Export to ONNX

patnorris · January 25, 2024, 7:59pm

Thank you all for today’s discussion! This is the generated summary (2024.01.25):

Introduction to tract-ic-ai by Jeshli (GitHub - jeshli/tract-ic-ai: Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference): A pruned-down version of ‘tract’, which supports 85% of ONNX functionality, a generic machine learning model, and parameter storage template. This part focuses on the compatibility of AI models with ONNX format and the challenges of implementing certain functionalities.
Optimizing ‘tract’ for AI Applications: The talk goes into the details of optimizing the ‘tract’ library for core neural network applications, focusing on pruning unnecessary functionalities like Random Forest and Fast Fourier Transform to make it more efficient for specific tasks.
Large Language Models and ONNX Functionality: There’s a detailed discussion about the difficulties of implementing large language models like GPT-2 in the ‘burn’ library due to the absence of necessary node functions, leading to a preference for pruning down ‘tract’ instead.
Rust Connect py AI to IC (GitHub - jeshli/rust-connect-py-ai-to-ic: A streamlined open-source tool for deploying Python AI models on the Internet Computer (IC). Enables easy uploading and inference.): Introduction of a library for demonstrating downloading GPT-2 from Hugging Face, segmenting it, and running it efficiently. This part of the conversation is about making AI models more accessible and deployable in decentralized networks.
Front-End Development and Tokenizer Implementation: The discussion touches on front-end development, particularly implementing a tokenizer and the feasibility of running Rust code in the browser for AI applications.
Community Collaboration and Sharing of Work: The call includes moments where participants discuss sharing their work on forums and collaborating on AI projects, reflecting the collaborative nature of the community.
Interest in Impact of Decentralized AI: There is a keen interest in understanding and exploring the impact of decentralized AI, even from those without a deep background in machine learning or AI.
Potential for GPU Environment Linked to Decentralized Networks: There’s a discussion on the potential of connecting a GPU cluster to a decentralized network like the Internet Computer (IC) to leverage the advantages of smart contracts and enhance performance.
Privacy Concerns with Decentralized AI and VetKeys: The conversation touches on the challenges of ensuring privacy in decentralized AI. Concerns are raised about the security of VetKeys and the possibility of memory snapshots extracting keys from malicious nodes.
GPU Infrastructure and Confidential Computing: The discussion also delves into GPU infrastructure and the concept of confidential computing, which includes secure computation enclaves and encrypted computation. The potential of new features in AMD CPUs and NVIDIA GPUs for secure data processing is discussed.
Challenges of DDoS Attacks on Decentralized Networks: The group mentions ongoing issues like DDoS attacks on decentralized networks, indicating that there are still challenges to be addressed in ensuring the security and reliability of such systems.
Data Sharing in Multiplayer Mode: The conversation highlights the multiplayer mode of decentralized AI, where diverse players can benefit from shared data and native integration with blockchain networks. This leads to a discussion about how manufacturers and producers could use shared data for optimizing products and sales.
Single Player Mode Advantages: The single player mode, focusing on individual users or companies, is discussed in relation to privacy and the reduction of costs by eliminating middlemen. The group talks about the potential for privacy improvements with upcoming features on the IC.
Concerns About Encryption and Data Security: The participants discuss the inherent risks in data sharing, acknowledging that encryption is secure until it’s broken. The conversation highlights that no platform, including the IC, is immune to future advances in decryption technologies.
Personal AI Solutions on the IC: The IC is recognized for enabling personal AI solutions. Individuals can have their own AI models running in canisters they fully control, using virtualized hardware rented from the IC. This setup is seen as more feasible than setting up personal hardware for most people.
Privacy Continuum and Control Over AI: The concept of a privacy continuum is introduced, suggesting that different solutions offer varying degrees of privacy. Having full control over one’s AI provides a higher level of security and ensures that the results serve the user’s best interests, free from bias.
Anonymous Deployment of AI Models: A notable feature of the IC is the potential for anonymous deployment of AI models. Users can deploy canisters without attaching their identity or credit card information, which adds a layer of privacy since even if data is exposed, the identity of the individual remains unknown.
Ongoing Discussion and Follow-Up: The participants express interest in continuing the discussion on the advantages of AI deployment on the IC. There’s a plan for a follow-up post to keep the conversation going and gather more ideas on how to leverage the IC for AI advancements.

patnorris · February 1, 2024, 10:22pm

Thank you all for today’s call (2024.02.01). We covered a lot of ground, this is the (comprehensive) summary:

Decentralized AI and Agent Training: There was a strong focus on the potential for training AI models on the IC to cater to specific niches and preferences, including image generation with varying datasets to achieve targeted outcomes like emotions in faces. This underscored the IC’s flexibility in AI model training.
Decentralized Network Benefits: The benefits of a decentralized network in fostering a collaborative environment were highlighted. Such a structure supports the sharing and recording of contributions, enabling participants to be rewarded for their input and fostering a community-driven approach to AI development.
Leveraging ICP Functionality: A significant portion of the discussion revolved around maximizing the use of Internet Computer Protocol (ICP) functionalities for project enhancement. There was a keen interest in exploring the platform’s capabilities further.
Decentralized Approach to Training Data: The importance of a decentralized method in gathering and utilizing training data was acknowledged, emphasizing the need for diverse and robust datasets for effective AI model development.
Addressing Technical Challenges: Technical hurdles, especially those related to understanding and utilizing ICP functionalities, were discussed. The conversation extended to the need for resources, guidance, and a supportive community for developers on the IC.
Future Directions and Support: There was a call for the development of resources, educational materials, and structures to better support AI project development on the IC. This includes detailed documentation, examples of AI applications, and discussions on hardware requirements and costs.
Participants expressed the need for a collective repository or space to document and share experimental application limits and GPU pipeline setups, including specific machine types and configurations. They discussed the desire to scale up resources as more become available on the Internet Computer (IC), emphasizing experimental applications still in research mode that could benefit from expanded capabilities.
The conversation shifted to the technical aspects of running AI-enabled codebases on traditional setups versus the potential of the IC, including the use of virtual machines with GPUs and local setups with advanced graphics cards. The group highlighted the importance of collecting data on resource consumption and performance metrics to better understand how current development and testing practices could translate to operations on the IC, focusing on security, resilience benefits, and the utilization of GPU subnets.
Discussions also touched on the need for more detailed metrics regarding GPU usage, software stacks, model sizes, and data processing quantities. The idea of summarizing these metrics in a structured way, possibly through surveys or forums, was proposed to foster a broader community contribution.
Furthermore, the group talked about the potential of using larger memory limits, like WASM 64, to significantly increase canister capabilities, particularly for AI model testing and deployment. There was a consensus on the importance of sharing resources, including benchmarks and code examples, to support the transition to more efficient architectures and the exploration of larger memory models.
The call highlighted a collective interest in enhancing the IC’s infrastructure to better support AI applications, including the need for more resources, better documentation of experimental limits, and the exploration of new technologies like WASM 64 for increased memory. Participants were encouraged to contribute data and insights to help shape the future development of AI capabilities on the IC.
Community Engagement and Collaboration: The critical role of community engagement in driving AI innovation on the IC was underscored. Participants were encouraged to actively share their projects, questions, and insights, emphasizing a collective effort in exploring decentralized AI’s possibilities.
Exploration of Services and Databases: Interest was expressed in identifying and using existing IC services and databases to enhance AI applications, showing a proactive approach to leveraging existing resources.
Vector DB Development and Marketplace for AI Tools: Initiatives like the development of a basic vector DB model and the creation of a marketplace for AI tools on the IC were discussed. These efforts aim to foster a rich ecosystem for AI development, enabling easier integration and monetization of AI tools and models.
Composability, Interoperability, and Scalable Architectures: The discussions included the significance of composability and interoperability in AI applications on the IC, and future talks were planned around scalable architectures for deploying multiple canisters as per demand.
Recording and Knowledge Sharing: The suggestion to record sessions or provide voice recordings to make discussions more accessible was made, highlighting the importance of archiving and sharing knowledge within the community.

patnorris · February 9, 2024, 3:26pm

Thank you for this week’s session (2024.02.08)! Big thanks to @apotheosis who shared all his expertise on zkML and opML with us, and @branbuilder who talked about ELNA’s vector db implementation this is the generated summary (short version, link to a very long one below):
The group’s discussion on Zero-Knowledge Proofs (ZKPs) and optimistic AI/ML on the Internet Computer (IC) highlighted the technology’s potential for privacy and scalability in decentralized AI systems. ZKPs, particularly beneficial for privacy-preserving applications, allow for verifying data or computations without revealing underlying information. The IC’s architecture is well-suited for ZKPs, offering advantages over platforms like Ethereum by facilitating on-chain storage and verification of proofs. The conversation also explored the challenges of deploying AI and ML models on the IC, such as the need for specialized GPU support and the limitations of WebAssembly for large AI models. Practical applications of ZKPs for AI on the IC were discussed, including privacy-preserving AI models and verifiable computation. The group acknowledged the technical challenges in implementing ZKPs but noted recent advancements that have improved efficiency. The discussion transitioned to vector databases on the IC, focusing on development efforts and the potential for scalable architectures to support AI applications. Enhancements in GPU support and increased instruction limits were identified as crucial for the performance of vector DBs on the IC. Participants expressed interest in sharing developments and collaborating on vector DB implementations.

Please find the detailed summary here: 2024.02.08_ICDeAITWG_CallSummary - Google Docs

icarus · February 15, 2024, 1:15pm

The newly introduced Arcmind AI project is highly relevant to readers of this forum thread.
Go read about it over here ArcMind AI - Autonomous AI Agent and Vector DB

Topic		Replies	Views
Announcing Technical Working Groups Developers	38	24931	July 25, 2024
DeAI Marketing Site and Campaign Survey: Q2 2024 Developers	0	115	April 15, 2024
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	154	9333	December 16, 2024
Working with Decentralized LLM Developers	2	492	January 14, 2024
Is ic-onchain-node got GPU to training and inference for deAI? if not ,any plan? General Discussing	4	442	April 12, 2024

Technical Working Group DeAI

Related topics