Help wanted : Auto scaling by spawning canister : concurrency issues

Hey everyone !

Here is my problem :
I would like to store some data on bucket canisters managed by an Orchestrator canister.
All the requests would pass through this Orchestrator.

Upon receiving a request this special canister is supposed to :

  1. Check if there is a bucket canister available. If there is none, spawn one.
  2. Redirect requests to the proper bucket.

My problem is at #1.
If I send one request at a time everything is fine, but if I throw to my Orchestrator 2 concurrent requests, it creates 2 canisters at the same time.

Is there any way to prevent this behavior ?

@dsarlis @ulan pinging you as I don’t know who else could answer this :thinking:

Thanks in advance :pray:

1 Like

That’s a cool problem!

Let me rephrase it to make sure I understood correctly:

  1. We have two requests A and B sent to Orchestrator.
  2. Request A arrives first, observes that there is no canister and starts creating it asynchronously by calling the IC management canister.
  3. Request B arrives second, observes that there is no canister and starts creating it asynchronously by calling the IC management canister.
  4. When both calls complete we have two canisters instead of one.

The simplest solution I would recommend is to make Orchestrator robust against this race condition:

  • Introduce a global pool of unassigned canisters.
  • All extra canisters created due to the race condition are added to that pool.
  • When a request needs a new canister it first tries to get a canister from the pool and actually creates a new canister only if the pool is empty.
  • Requests try to lazily refill the pool if it gets below some threshold (by doing two parallel calls: redirect the request to the bucket canister and at the same time call the IC management canister to create a new unassigned canister). It should be possible to ensure that two requests don’t try to refill the pool at the same time by having a global variable that tracks the number of pending refill requests.

Another benefit of this approach is that it improves the latency of a request because most of the time the request doesn’t have to wait for canister creation and can simply get the canister from the pool.

If having a global pool of unassigned canisters is not desirable, there is another approach to make request B to busy wait by doing no-op calls to the IC management canister until request A completes creating a canister. But that’s too complex and inefficient.

4 Likes

Thank you so much @ulan for this comprehensive answer !
I think I have what I need to go further in my endeavor :rocket:

Cheers !

2 Likes

Hey @dymayday

Since I’ve already written CanDB in Motoko, I’d be happy to help work on a Rust version of it with you.

CanDB is really a framework, so there’s nothing specific that says it has to be written in its current language (Motoko). I think the community would benefit a lot from bringing CanDB to Rust.

1 Like