Blueband - vector database

Blueband

Blueband is a vector database on ICP, building from the core of Vectra and its local db principles.

Blueband persists data like a traditional db and saves its embeddings on ICP’s stable memory. This setup can be ideal for use-cases involving small, mostly static datasets where quick comparison is needed.

  • Loading Data into Memory: The index, which contains metadata and vectors, is loaded from the persistent storage (a collection’s canister) into a system’s memory

  • Querying: Once in memory (initialized), the index can be queried to calculate and rank the similarity between saved vectors, and external prompts.

Getting Started

Prerequisites
To use Blueband, deploy a blueband_db_provider canister by adding the prebuilt canister to your dfx.json:

{
  "blueband_db_provider": {
    "type": "custom",
    "candid": "https://github.com/acgodson/blueband-db/releases/download/v0.0.9/blueband-db-backend.did",
    "wasm": "https://github.com/acgodson/blueband-db/releases/download/v0.0.9/blueband-db-backend.wasm.gz"
  }
}

You can point your backend canister to the blueband_db_provider’s canister to make storage calls from your backend.

ic-use-blueband-db: is a simple React library for interacting with your db on the frontend. It exports functions to load indexes into the system’s memory, save new items, and compare similarities between saved documents and external prompts using in-memory operations.

Usage

1.	Initializing 

Connect actor and initialize index:

import {actor} from "./provider_actor_path";
import { useBlueBand } from "ic-use-blueband";

const ReactComponent = () = {
const { initializeIndex} = useBlueband();

const collectionId = "unique collection_id";
cons config = {
    collection: collectionId,
    api_key: OPENAI_KEY,
    /*chunk options*/
}

await initializeIndex(actor, config);
2.	Add Items

Add documents to the index:

const { AddItem, Query } = useBlueband();

const title = "Document Title or Url";
const content = "Document content...";

await AddItem(title, content);
3.  Query Items

Query the index to find documents similar to a given prompt:

const { Query } = useBlueband();

const results = await Query("query text");

//Results are ranked by similarity scores:

// [
//   {
//     "title": "Document Title",
//     "id": "document_id",
//     "score": 0.951178544877223,
//     "chunks": 1,
//     "sections": [
//       /*...*/
//     ],
//     "tokens": 156
//   },
//   {
//     "title": "Document Title",
//     "id": "document_id",
//     "score": 0.726565512777365,
//     "chunks": 4,
//     "sectio ns": [
//       /*...*/
//     ],
//     "tokens": 500
//   }
// ]

Links

Demo

9 Likes

This looks like a really clean interface! Looking forward to checking it out

3 Likes

Thanks @kpeacock

made a quick demo here https://6fsnc-oaaaa-aaaag-aliwa-cai.icp0.io

Slower than ideal speed at the moment, because the OpenAI proxy for embedding is from my local machine

3 Likes

@laughtt look for this case

Great work! Please consider making a PR to GitHub - dfinity/awesome-internet-computer: A curated list of awesome projects and resources relating to the Internet Computer Protocol

1 Like

:cyclone: Blueband Update – Rust Canister

From a hybrid in-browser model to a fully on-chain vector db. The goal remains the same: enable semantic document search via vector similarity—but now, the heavy lifting is done inside the rust canister.


Key Improvements

1. On-Chain Cosine Similarity

Previous architecture offloaded similarity search to the frontend/browser. Documents were embedded and stored persistently on-chain, but at query time, the index had to be loaded in-browser for ranking.

Now, we’ve migrated vector distance computations fully into the backend canister. The cosine similarity between query and document vectors is calculated on-chain, and results are ranked and returned as scored matches.

Example:

// Actor call to canister's demo_vector_similarity
const result = await actor.demo_vector_similarity(docs, query, EmbeddingProxyUrl, [1], []);
// Canister returns top-scoring [1] document with similarity score

2. Canister Logic

Blueband’s full-cycle document storage, embedding and computation logic is powered by a single backend canister. The logic to:

  • Create collections ref & store documents
  • embed documents,
  • store chunked vectors,
  • compute similarities, and
  • return ranked results

…is implemented in Rust and exposed via Candid.

You can see this in action in our SimpleTest class, where queries like “Which sport is more popular?” return cosine match to “Soccer is the most popular sport…” — directly from our on-chain vector engine.


const docs = [
  "Pizza is a delicious Italian food with cheese and tomatoes",
  "Soccer is the most popular sport in the world", 
  "JavaScript is a programming language for web development",
];

const query = "Which sport is most popular?";

const results = await actor.demo_vector_similarity(
  docs,
  query,
  "<openai-embedding-proxy-url>",
  [1], // Only return top result
  []
);

console.log(results);
// Returns: [{ score: 0.91, text: "Soccer is the most popular sport in the world", ... }]

3. 500 GiB Stable Memory Expansion

Earlier implementations were constrained by motoko limits. The Db can now:

Access 500 GiB stable memory for storage.


Why This Matters

This evolution would enable our projects:

  • Make 100% on-chain inference for semantic search
  • scalability and admin control for collections
  • Simpler, yet performant client-side SDKs (just query + result)

Blueband is one of the simplest vector db on ICP for static document storage and queries.

:white_check_mark: Blueband KinicDAO ArcMind Elna Vectune
Algorithm Hybrid (cosine + K-means) Vamana k-d tree HNSW FreshVamana
Best For Static document search

GitHub and Documentation page. Excited to continue pushing this forward, and welcome feedback and contributions.

2 Likes

I think this is worth sharing with the DeAI WG in case you haven’t planned already :slight_smile:

4 Likes