Hello, community! I’m thrilled to share the results of our three-month endeavor. We have successfully deployed the world’s largest on-chain vector database on the ICP network and tested its stress load.
Last weekend, we achieved a significant milestone by deploying this massive vector database into the ICP network. We invite everyone to test it by querying the canister to see the response time and the recall@5 score.
We utilized 100 million data points from the deep1b dataset for our tests, which consists of 96-dimensional vectors. The total size of the graph data, including edges and vectors (excluding source data like text and images), is over 70 GiB. Despite this large data size, we managed to achieve an average query speed of 350 ms. This performance demonstrates the ICP network’s viability as a practical DeAI cloud infrastructure.
This project, now open-sourced, is an invaluable tool for all DeAI developers working on Retrieval Augmented Generation (RAG) on the ICP.
git clone https://github.com/ClankPan/ic-vectune
cargo run --release --bin tool -- search --ic $(dfx canister --ic id instance_100m)
...
query_index 98
[73774751, 36414811, 79944017, 74001073, 95461516]
[73774751, 36414811, 79944017, 74001073, 95461516]
hit: 5/5
query_index 99
[60315619, 76855953, 85086713, 39134802, 4005157]
[60315619, 76855953, 85086713, 39134802, 4005157]
hit: 5/5
average query-time: 386.26 ms
average recall-rate: 93.4 %
How We Achieved This:
-
Use of SIMD Instructions: We leveraged SIMD instructions that perform operations on multiple data points simultaneously (f32x4), significantly reducing the number of computations needed for Euclidean distance calculations [1]. This helped us avoid exceeding the query execution infrastructure limit. SIMD is a feature added to ICP just last month, and our project is one of the first to heavily utilize it.
-
Stable Memory Utilization: ICP’s stable memory offers incredibly fast storage speeds, tens of times faster than EC2’s EBS, and comparable or better than instance-store. This capability was crucial in achieving high performance in a WASM environment without file cache.
-
Adoption of Vamana Graph: We implemented the Vamana graph algorithm from scratch in Rust and named it Vectune. Developed by Microsoft, Vamana is a core algorithm of DiskANN used in Bing’s image search and is based on the Navigational Small World Graph model. It requires fewer disk accesses and maintains a more compact graph compared to HNSW. Unlike HNSW, Vamana supports incremental indexing, allowing for continuous data insertion [2].
-
Robust Update Calls: We uploaded 70 GiB of data to the canister in about 47,000 update calls over 12 hours, split into 1.5 MiB chunks, with only around 100 errors. This robustness demonstrates the ICP network’s resilience, and we deeply appreciate the support from ICP’s engineers.
-
Cost-effective Query Calls: Query calls are not only fast but also inexpensive. While hosting on EC2 could lead to unpredictable costs due to increased traffic from more users, the main costs on ICP are associated with data uploads and maintenance. The data upload cost for this project was 200 trillion cycles, with an estimated annual maintenance cost of 250 trillion cycles, making cost forecasting more straightforward for developers.
-
Use of Rust: Choosing Rust as our development language facilitated easier testing and debugging. Particularly, the interface for stable memory is nearly identical to using mmap, allowing seamless transitions between debugging with SSD code and testing with ICP code without any restrictions.
Support from SNS DAO: The success of this project would not have been possible without the support from KInicDAO. We are deeply grateful to the community, the advice from the DfinityDevTeam, and everyone who used the KinicVectorDatabase and provided feedback.
Future Developments:
We have ideas to reduce the 350 ms response time further to 100 ms [3], which we will focus on in the coming weeks. After that, we will use it for various use-case, such as semantic search, RAG, LLM and much more.
Tips:
[1]: Regardless of the graph size, the number of disk accesses during searches remains nearly constant, visiting about 135 nodes and making 90 disk accesses at each. This is akin to the principle that everyone in the world can be connected through just six degrees of separation. In total, each query involves 12,150 L2 distance computations, with SIMD instructions being used about 3,000 times per query.
[2]: While data insertion is feasible with HNSW, it depends on the implementation. SmallWorld graphs are not scale-free; continually inserting data can increase the number of short-distance nodes, losing the SmallWorld property. HNSW faces limitations when the lowest layer of nodes becomes excessively numerous, increasing the nodes to be searched. However, Vamana maintains a balance through Robust Pruning, always incorporating short, medium, and long-range edges, making the graph incrementally adjustable. Adjustments are crucial as the starting medoid for searches can shift with data changes.
[3]: Vamana’s graph is adjusted through Robust Pruning to always include short, medium, and long-range edges in each node. In practice, only about 10% of these edges are used in neighbor searches, with the remaining 90% leading to unnecessary disk accesses. By avoiding access to this 90%, we can further speed up the queries. By leveraging the optimal stopping problem, we can reduce disk accesses for up to 10 million data points without losing accuracy. For a scale of 100 million, further reducing disk accesses is possible by performing additional approximate nearest neighbor searches within these edges.
Thank you all for your continued support and feedback!