We recently had to fix a real race condition in Canic: multiple concurrent inter-canister calls competing for the same limited resource. The first attempt was the obvious one—serialize execution behind a single shared key. That did eliminate races, but it also collapsed concurrency and introduced new failure modes: one slow or stuck call could block unrelated work. In other words, correctness via a global critical section.
The solution we landed on moves coordination into the call semantics themselves using intents. An intent is a declarative reservation attached to a call: “this call requires X units of resource Y.” The system reserves capacity before executing the call, then commits or releases it based on the outcome. Importantly, application code never performs reservation logic directly.
What the public API looks like
From an application or canister author’s perspective, this is all you see:
use canic::api::ic::call::{Call, IntentKey, IntentReservation};
let intent = IntentReservation::new(
IntentKey::try_new("capacity")?,
1,
)
.with_ttl_secs(30);
let result = Call::unbounded_wait(target_canister, "perform")
.with_intent(intent)
.with_arg(())
.execute()
.await?;
There is no locking API. No “reserve” or “commit” calls. No shared mutex. You simply declare what the call needs, and the framework enforces it.
That’s intentional. The API layer defines contract and shape, not behavior.
Why there is “logic” in the API layer
The API layer is responsible for failing fast and deterministically. For example:
pub struct IntentKey(String);
impl IntentKey {
pub fn try_new(value: impl Into<String>) -> Result<Self, Error> {
let bounded = BoundedString64::try_new(value.into())
.map_err(Error::invalid)?;
Ok(Self(bounded.0))
}
}
This is not orchestration logic. It’s input validation and contract enforcement. By the time a call reaches the workflow layer, intent keys are already well-formed and bounded. That allows the deeper layers to stay focused on semantics instead of defensive checks.
Where the real work happens
The actual coordination lives in the workflow layer, specifically in the call execution path:
api::CallBuilder
↓
workflow::ic::call::execute
→ allocate intent id
→ reserve capacity
→ perform IC call
→ commit on success | abort on failure
↓
ops / infra (single-step, intent-agnostic)
Conceptually, the workflow does something like:
reserve(intent)?;
let call_result = execute_call().await;
match call_result {
Ok(_) => commit(intent)?,
Err(_) => abort(intent)?,
}
This is the only layer allowed to coordinate multi-step behavior. Ops and infra remain unchanged: they still perform exactly one thing and know nothing about intents.
Why this scales better than blocking
With the original “single key” approach, every contended operation was serialized, even if the contention was unrelated. With intents, coordination is localized to the resource being reserved. Calls that don’t contend proceed concurrently. Calls that do contend fail fast and deterministically instead of blocking the system.
This also means failure handling is cleaner. If a call fails, the reservation is explicitly aborted. If something traps mid-execution, TTL acts as a backstop. There’s no global lock to get wedged.
How we test it (and why that matters)
End-to-end tests don’t reach into intent storage or counters. Test canisters use the same public API as real applications:
Call::unbounded_wait(authority, "buy")
.with_intent(IntentReservation::new(
IntentKey::try_new("capacity")?,
1,
))
.execute()
.await
We validate behavior under concurrency—one call succeeds, another fails, retries work—rather than inspecting internal state. That’s deliberate: the contract is behavioral, not structural.
The design principle
The guiding rule here is simple:
-
API layer: “Is this request valid?”
-
Workflow layer: “What should happen?”
-
Ops / infra: “How do we do it?”
Intents follow that rule exactly. They look simple at the surface because the complexity has been absorbed by the framework. That’s not accidental—it’s the point.
One detail worth calling out, because it often gets glossed over in discussions of “intent systems,” is how the state is actually laid out in stable memory. Intents are not ephemeral locks; they are explicit, durable state with clearly defined ownership and lifecycle. In Canic, intent state is isolated into a dedicated stable-memory region, with IDs reserved exclusively for intent and reservation tracking to avoid cross-contamination with unrelated subsystems.
Concretely, the intent subsystem uses the following memory slots:
-
INTENT_META_ID (18)
Stores global intent metadata and monotonic counters, including intent ID allocation and versioning. This is the anchor that makes intent IDs unique and replay-safe across upgrades. -
INTENT_RECORDS_ID (19)
Holds the per-intent records themselves: resource key, quantity, timestamps, and lifecycle state. Each entry represents a single declared reservation and its current status. -
INTENT_TOTALS_ID (20)
Maintains aggregated per-resource totals (reserved, committed, in-flight). This is what enables fast capacity checks without scanning individual intent records. -
INTENT_PENDING_ID (21)
Tracks pending (unresolved) intents for efficient cleanup, TTL enforcement, and recovery after traps or partial failures.
Slots 22–25 are intentionally reserved for future intent-related extensions (e.g. secondary indexes, sweeping metadata, or sharded totals), which allows the subsystem to evolve without reallocating or breaking stable layouts.
This explicit separation is not incidental. By giving intents their own stable-memory domain with well-defined responsibilities, we can reason about correctness, recovery, and growth independently of the rest of the system. Combined with call-level orchestration, it ensures that intent enforcement remains deterministic, upgrade-safe, and observable under real concurrency rather than being an in-memory locking trick that disappears at the first canister restart.
Would appreciate any insights into this sort of system because I’ve never written one before. It was mutex locking that got me in touch with Dom back in 2012 though.
PS. mission70 is a pile of ass
