How does canister state change when processing multiple messages that await inter-canister calls?

I recently got into a discussion with several other developers around canister intermediate state, and wanted to clarify how messages processing effects the internal state of canisters in the case that the messages being processed await inter-canister calls or perform other asynchronous behavior (i.e. HTTP requests).

See the example below and my questions.

Reframing this same scenario, but in the context of intermediate state and update APIs that make inter-canister (async) calls.

Canister A has two update APIs - lockFunds() and withdrawFunds(), both of which modify the state in canister variable funds. withdrawFunds() awaits the result of at least one inter-canister call before finishing.

For example, if withdrawFunds() looks something like this:

public func withdrawFunds(withdrawAmount: Nat): async Nat {
  funds -= withdrawAmount;
  await transferFunds(); // makes ICP transfer via the Ledger canister 
  funds;
}

Scenario 1

Let’s say I make 10 calls to withdrawFunds() at the exact same time, so that Canister A’s message queue has 10 calls to withdrawFunds().

Scenario 2

Now lets say I make 5 calls to lockFunds() and 5 calls to withdrawFunds() at the exact same time, so that Canister A’s message queue has randomly interleaved orderings of the 5 calls to lockFunds() and 5 calls to withdrawFunds().

Scenario 3

Take the same situation as Scenario 2, except now lockFunds() does not mutate funds, it just references the funds variable.

Questions for Scenarios 1-3:

  1. As the update calls are processed (in batch?), is there any leakage of the funds variable between update calls due to batch message process interleaving? Does it matter if the funds decrement happens before vs. after the await state commit point in the withdrawFunds() function code?

  2. Are there any changes to question 1) above if funds is not a primitive, but an object say Map<UserPrincipal, Funds> and each of the incoming calls is operating on a different user principal’s funds?

Scenario 4

Now let’s say Canister A also has an update API appendText(), that takes in a text and appends it to a string.

If I make 5 calls to withdrawFunds() and 5 calls to appendToText(), will the message execution of these calls be “smartly” interleaved knowing that these functions touch separate state, or will each message wait for the previous message to be processed before executing?

1 Like

After talking about this with @jorgenbuilder, he came up with a great Motoko example showing how this works when you have multiple await “commit points” in an API function body.

To understand it, let’s look at the main.mo actor.

This example consists of an actor with a simple test() API that has a counter variable i, and awaits an async self call to foo(), which is a function that increments i. Within the test() function, local variables i0, i1, i2 keep track of intermediate values of i within a single API call. At the end of this function, the values of these local variables is logged as output.

Jorgen has a test setup where 100 asynchronous calls are made at the same time to the test() API endpoint.

I’ve modified the example to include making just 30 asynchronous calls at the exact same time, of which you can find the output below.

[
      'update: i0 = 0, i1 = 25, i2 = 55',
      'update: i0 = 0, i1 = 15, i2 = 45',
      'update: i0 = 0, i1 = 23, i2 = 53',
      'update: i0 = 0, i1 = 2, i2 = 32',
      'update: i0 = 0, i1 = 12, i2 = 42',
      'update: i0 = 0, i1 = 6, i2 = 36',
      'update: i0 = 0, i1 = 29, i2 = 59',
      'update: i0 = 0, i1 = 5, i2 = 35',
      'update: i0 = 0, i1 = 16, i2 = 46',
      'update: i0 = 0, i1 = 8, i2 = 38',
      'update: i0 = 0, i1 = 30, i2 = 60',
      'update: i0 = 0, i1 = 4, i2 = 34',
      'update: i0 = 0, i1 = 27, i2 = 57',
      'update: i0 = 0, i1 = 3, i2 = 33',
      'update: i0 = 0, i1 = 17, i2 = 47',
      'update: i0 = 0, i1 = 18, i2 = 48',
      'update: i0 = 0, i1 = 1, i2 = 31',
      'update: i0 = 0, i1 = 28, i2 = 58',
      'update: i0 = 0, i1 = 24, i2 = 54',
      'update: i0 = 0, i1 = 11, i2 = 41',
      'update: i0 = 0, i1 = 9, i2 = 39',
      'update: i0 = 0, i1 = 20, i2 = 50',
      'update: i0 = 0, i1 = 22, i2 = 52',
      'update: i0 = 0, i1 = 21, i2 = 51',
      'update: i0 = 0, i1 = 13, i2 = 43',
      'update: i0 = 0, i1 = 19, i2 = 49',
      'update: i0 = 0, i1 = 26, i2 = 56',
      'update: i0 = 0, i1 = 14, i2 = 44',
      'update: i0 = 0, i1 = 7, i2 = 37',
      'update: i0 = 0, i1 = 10, i2 = 40'
    ]

** (Keep in mind when looking at the output that there’s a good reason numbers are scrambled. This is because the messages were sent out from a local CLI script in parallel, and it is essentially random which one makes it to the API first. However, once a message makes it to the canister ingress queue its ordering is locked in place)


What this shows is that await “commit points” within a single API or function act as a “HALT”.

In the case where the test() function is called 30 times, each time we hit an

await foo();

the entire message queue catches up to the point of await and halts, not proceeding until the preceeding asynchronous execution result is returned from the call to foo().

This is why in the example output, we see at the beginning for all messages, i0=0, and then i1=1-30, and then i2=31-60, and the message processed ordering is preserved (i.e. if m5 was the 5th message in the queue, it will produce a result of i0=0, i1=5, i2=35).

It makes sense to me if I think of it like this - all update messages push as fast as they can (but in an orderly fashion) through the canister until they pile into the next await “commit point”, which they are held up at until the message directly ahead of them proceeds.

Takeaways for developers:

This means that within the context of a single round of consensus, multiple update calls can all mutate the state independently of one another and can effect the result of the final message result that was before them in the queue! Another way of saying this:

If update messages 0 through n (0 comes first, n last) are processed in the same round of consensus - message n can impact the resulting computation of message 0 if there are multiple intermediate await “commit points” in your API function.

Essentially, if your IC app has multiple await calls in a single API - you as a developer need to be prepared and account for the underlying state of your canister actor to “shift underneath its feet” during execution of the API.



Also, note that these intermediate state change mutations do not apply to queries (i.e. mutating state in a query call does not affect the intermediate or finalized state processed via an update call).

3 Likes

Hmmm… I tried a similar experiment in rust.

use std::cell::RefCell;

thread_local! {
static I:RefCell = RefCell::default();
}

#[ic_cdk_macros::update]
async fn greet(name: String) → String {

let mut i0:u32 = 0;

I.with(| i| {
    i0 = *i.borrow();
});


foo().await;


I.with(| i| {
    let x = &mut *i.borrow_mut();
    *x = *x + 1;
});


let mut i1:u32 = 0;
I.with(| i| {
    i1 = *i.borrow();
});


format!("Hello, {}   {}   {}!", name, i0, i1)

}

async fn foo() {

}

i pass in date in NanoSeconds here in my execution driver here.

for i in $(seq 1 5)
do
echo $i
DATE=date '+%s%N'
echo $DATE
dfx canister call multi_async_in_api_backend greet $DATE &

done

and this is what I get

(“Hello, 1671434946574816424 120 121!”)
(“Hello, 1671434946568431482 121 122!”)
(“Hello, 1671434946571679830 122 123!”)
(“Hello, 1671434946569939695 123 124!”)
(“Hello, 1671434946573268039 124 125!”)

1 Like

Interesting - seems like with this example the state does not shift underneath.

Curious if @claudio has any input as to the differences in behavior.

It’s probably easiest if I explain first how my mental model works and then us that to answer the questions.

My mental model:

  • Canisters are single-process, but multi-threaded programs
  • Every await is a call to thread::yield
  • Yields can only happen where an await is placed
  • Any state change gets committed when await is called and is visible to every other thread from that point on
  1. Yes, state changes are visible to the other messages and permanent unless undone explicitly. The order matters a LOT. I’ll show below with an example.
  2. Data type does not matter (unless the data type requires you to switch to async operations)

AFAIU consensus queues a number of update calls in a fixed order to be executed in a round. Those updates are then run one after another up to the point where an await is called (or the function returns). Then this update call’s execution gets paused, the canister call is put in the right message queue, and the next update call starts executing. Under some special circumstances the awaited call can run in the same block, but you should never rely on that to happen.

So are the calls ‘smartly interleaved’? Maybe if they contain awaits. No state analysis is done AFAIK.

Now for an example how you should handle this. I’ll go with pseudocode to make things easier to write. Take the following code how you should do things:

[1] func withdraw_funds(amount) {
[2]   if(balance >= amount) {
[3]     balance := balance - amount
[4]     result := transfer(amount).await
[5]     if(not result.is_ok) {
[6]       balance := balance + amount
[7]     }
[8]   }
[9] }

What happens if you…

  • Remove condition on line [2]
    • User can withdraw any amount they want, even if no funds should be available
  • Remove lines [5] through [7]
    • If the transfer fails, no funds are transferred, but they are not in the user’s account anymore.
  • Swap order of [3] and [4]
    • If user calls the withdraw function twice with e.g. amount == balance, the first invocation will start the transfer, yield execution, and the second invocation will again send out the funds even though there should be none left to send out. User can withdraw any amount, but needs to time the calls appropriately

Another way to mess up, inserting line [2a] and changing line [6]:

[1] func withdraw_funds(amount) {
[2]   if(balance >= amount) {
[2a]    original_balance := balance
[3]     balance := balance - amount
[4]     result := transfer(amount).await
[5]     if(not result.is_ok) {
[6]       balance := original_balance
[7]     }
[8]   }
[9] }

This introduces a race condition where one invocation may accidentally undo legitimate/important changes made by another invocation

Excellent article that references this exact issue and covers these types of reentrancy bugs. Looking forward to seeing this project by continued an the DFINITY RFP bounty being pursued by someone!

1 Like