How to handle filtering/pagination at scale

Hello,

I’m trying to understand how to handle filtering and pagination efficiently and maintainably using the actor model.

For example, if I wanted to build an Amazon clone on ICC, using an index canister managing product creation and assigning each new product to a bucket, there will eventually come a point where I may have millions of buckets containing millions of products.

Now I want to introduce a filtering system:

  • I want to retrieve products ordered by price

  • I want to retrieve products ordered by price and buyer notes

On a traditional database, I would simply use something like: LIMIT x OFFSET y ORDER BY price ASC

But on the IC, there’s no centralized store. The only approach I can think of is to create an ordered index canister by price, which would itself manage many buckets storing an ordered list of all products by price. This means that whenever a product’s price changes, I might need to rearrange data across buckets to avoid exceeding the stable memory limit, especially if many products share the same price range.

When multiple filters are involved, the problem becomes even worse. It feels like an unmaintainable mess for every combination of filters I might want to support, and an enormous amount of work to replicate something trivial in Web2.

So my main question is: What’s the best way to handle this on the IC? I feel like I’m missing something fundamental.

Additional questions:

  • Can candb help manage this in a performant way?

  • Could the upcoming roadmap item, “Immutable Blob Storage Service for ICP,” help? (e.g., storing an ordered map by price in a blob and using it as a central store, possibly even with SQLite, to at least eliminate bucket management complexity)

candb cartainly appears to have indices for sorting, although I haven’t played with them myself, personally. I’ll have to put that on my TODO list. :slight_smile: I mostly write low-level Rust code and fine tune my database indices directly. I can tell you how to do that but you would probably prefer a turnkey solution such as candb.

As mentioned, I wouldn’t worry too much about scale until you can demonstrate that you can attract users, lots of them. Iteration velocity is typically more important than scalability for early stage startups. That’s speaking as a long-time start-upper who has worked on many projects and used to evaluate startups going into Founders Forum. I’ve seen many startups worry about scale when they have 5 users. Scale is a “nice to have” problem. I don’t want to downplay scalability, but early on obsessive focus on customers, and what brings them value, is far more important.

The blob store is a good place for photos, videos and other big chunks of data. It is indexed by a hash of the blob and any HTTP headers it would be served with, so it’s not sorted. The blob store is new and really focused on Caffeine apps at the moment, but there has been plenty of interest in using it for non-caffeine apps. Technically easy. It would need a decision, that is the hard bit. :smiley:

1 Like

Tank you for your response, I know I’m not building the next unicorn :grinning_face_with_smiling_eyes: , I just do it for fun.

I’m interested in the how to. I’m really far to be a rust expert, but I know enough to understand the code. If you have exemples to share which answered this kind of problematics (handling composites indexes in different orders (price/creation date or creation date / price for exemple) splitted on multiple buckets, i’d gladly take them.