Stable memory is not synchronous for some reason

baolongt · May 2, 2025, 8:33am

Hi dev, I have this weird situation with stable memory.

I have an API create a record on stable Btree process_action. What it does:

Finding the action is existed or not, find by key prefix example code

store
.range(prefix.clone()..)
.filter(|(key, _)| key.starts_with(&prefix))
.map(|(_, value)| value.clone())
.collect();

If not create the record
Else update the record with input

But when frontend call this method 3 times in very short time

The log show me that the step 1 didn’t find the record for 3 calls, and it created 3 records with different ids.

Can someone explain me why stable memory is not synchronous in this case?

Vivienne · May 2, 2025, 8:51am

You make three separate calls, each of which triggers a separate message execution. That alone lets me expect different timestamps, especially locally. I suspect there is some misunderstanding somewhere. Why do you think the timestamp changing between calls shows that stable memory access is async? (also, if your function is not async then there is no way (*assuming you don’t do something extremely weird) for the function to do async work)

baolongt · May 2, 2025, 9:57am

I don’t know but it happened in very small of time and the logs said me each call doesn’t find the record, I just thinking it is not async

baolongt · May 2, 2025, 10:00am

I wrote a unit test for testing via PicJS

but seems it work correctly, I don’t know why it doesn’t work while I did it on frontend

baolongt · May 2, 2025, 10:20am

I suspect this async method call made next call cannot be handle correctly

Finding the action is existed or not, find by key prefix example code
If not create the record
2.1 Call to ledger canister to check have enough balance (async)
2.2 create record and store
Else update the record with input

@Vivienne Is this possible to cause this issue
I think it is, just read the docs

Vivienne · May 2, 2025, 11:25am

We may not be using the same words for the same things. Let me try to explain in my words what you are doing.

Check if the record exists in the StableBtree
- This uses synchronous stable memory to fetch the (in this case nonexistent) record
The record does not exist, so you start the process to create it:
- Start with an async call to the ledger. This is an inter-canister call as described in your docs link. Other calls may be processed now.
Once the inter-canister call completes, you have the info to create the record
- Use synchronous stable memory access to create the record

If you run multiple attempts in parallel the system may (as in your frontend-triggered scenario) or may not (as in your PicJS test) interleave the multiple calls. If they are interleaved, then you have a race condition where you create records multiple times (as you observed in the logs). The proper way to deal with this is to introduce some application-level locking so you don’t create the same record multiple times. You could e.g. create the record in a ‘under construction’ state before doing the ledger call. Then other requests will see that the record is getting created already.

The whole process is not synchronous (since you make an inter-canister call), but that doesn’t mean that stable memory access is not synchronous. Does my explanation/rewording make sense to you?

baolongt · May 2, 2025, 4:04pm

thank for the explaining, it make sense

baolongt · May 13, 2025, 6:57am

Hi @Vivienne, I might need your help for this situation

The backend does this in one method :

make a transaction’s state ‘created’
make a transaction’s state to ‘in progress’
make ledger calls, related to the transaction
make the transaction’s state to ‘success’

If the frontend poll the status of the transaction in between these steps, I only get ‘created’ state.
And when the call finished the the ‘success’ state returned.

When does the updated state can be query from client (http agent)?

Vivienne · May 13, 2025, 8:55am

Queries can only read committed state changes. State changes are committed (assuming no panics) when a message finishes processing, or whenever you await something [EDIT: something external]. An extra difficulty is that (as in any distributed system) not all replicas may be fully up to date.

Also, does this happen on mainnet or locally? I could imagine that locally PocketIC is too fast to expose the intermediate state. On mainnet you need to make cross-subnet calls to make the ledger transactions. In that case I would say that a bug is more likely than hitting stale state repeatedly

baolongt · May 13, 2025, 9:14am

it happen in mainnet. This process loop n times (based on config) - in one method called.

I tried with n = 3, for each loop
it take me 6-8 second to finished the ledger call - cross-subnet calls

But it seems the state is not committed during the loop

do you think because my syntax cause un commited state?
the code I asked wrote in tx_manager_service.rs but it await by a lot of layer in api.rs

api.rs

#[update(guard = "is_not_anonymous")]
pub async fn update_action(input: UpdateActionInput) -> Result<ActionDto, CanisterError> {
    let api: LinkApi<RealIcEnvironment> = LinkApi::get_instance();
    
    let res = api.update_action(input).await;

    let end = ic_cdk::api::time();

    res
}

// LinkApi impl
 pub async fn update_action(
        &self,
        input: UpdateActionInput,
    ) -> Result<ActionDto, CanisterError> {
        let caller = ic_cdk::api::caller();

        let is_creator = self
            .validate_service
            .is_action_creator(caller.to_text(), input.action_id.clone())
            .map_err(|e| {
                CanisterError::ValidationErrors(format!("Failed to validate action: {}", e))
            })?;

        if !is_creator {
            return Err(CanisterError::ValidationErrors(
                "User is not the creator of the action".to_string(),
            ));
        }

        let args = UpdateActionArgs {
            action_id: input.action_id.clone(),
            link_id: input.link_id.clone(),
            execute_wallet_tx: true,
        };

        let update_action_res = self
            .tx_manager_service
            .update_action::<fn(ActionState, ActionState, String, ActionType, String)>(args, None)
            .await
            .map_err(|e| {
                CanisterError::HandleLogicError(format!("Failed to update action: {}", e))
            });

        update_action_res
    }

tx_manager_service.rs

pub async fn update_action<F>(){
    // do some thing
        for mut tx in eligible_canister_txs {
               // where I set the tx to `in progress`
                self.execute_tx(&mut tx).map_err(|e| {
                    CanisterError::HandleLogicError(format!("Error executing tx: {}", e))
                })?;

                // ledger call
                self.execute_canister_tx(&mut tx).await?;
            }

   // do something
}

Vivienne · May 13, 2025, 10:13am

Correction from before: anytime you await a call to some other canister is a commit point, not every await in general. Should have thought of that.

Your code looks fine to me. I even checked the implementation of self.execute_canister_tx in the repo you linked above and didn’t find anything suspicious. Let me ask some other people if I’m missing something obvious…

Vivienne · May 13, 2025, 10:17am

Maybe some ‘stupid’ things to double-check in the meantime:

Are you polling fast enough?
If you fetch the tx state just before executing the call, is it actually ‘in progress’?
Are you fetching the state for the right action?

mraszyk · May 13, 2025, 11:17am

dfx start --artificial-delay 10000 would introduce a delay of 10s in between rounds so it should be possible to inspect the state at the end of every round

mraszyk · May 13, 2025, 11:50am

This is also crucial for being able to observe the “processing” state. If the ledger canister is on the same subnet as your backend canister, then all calls could be executed within the same round and thus you’d never see the intermediate state.

baolongt · May 13, 2025, 7:14pm

Hey thanks for this advice. It work, turn out our frontend called wrong API

baolongt · May 13, 2025, 7:19pm

@Vivienne @mraszyk
Thanks for explaining how the committed state change work.

I found that the document didn’t well written for committed state. The only document relate that AI can find is this one and it is for motoko.

Vivienne · May 14, 2025, 9:36am

Thanks for the feedback! @Jessie, maybe you’re interested in this?

Topic		Replies	Views
Inconsistent stable memory increases Developers rust	2	122	June 20, 2024
Weird corruption in ic-stable-structures Developers	7	146	February 13, 2025
Motoko stable memory in 2022 Developers Discussing	25	2933	March 23, 2023
How does canister state change when processing multiple messages that await inter-canister calls? Developers	5	872	January 26, 2023
Data not reflect correctly Rust	6	57	February 25, 2025

Stable memory is not synchronous for some reason

Related topics