This confirms my original statement, that response delivery is guaranteed by the system. (if there are no bugs in the replica, and if the node machines are functioning)
Now, about canister code taking into account possible replica bugs:
-
Thank you for mentioning the value of making logs, I made a proposal here for logging each canister message, it’s cause, caller, arguments, and result. This can be a great help for checking the correctness of the canister state and also can help to fix the canister state in case of a trap roll back.
-
How far should a canister coder go to ensure replica bugs don’t harm the canister’s state? What happens if the replica incorrectly checks the caller’s signature? Should the canister also ask for the caller’s signature on the request in case there is a bug in the replica? Should a canister take into account that an outgoing message can reach the wrong canister because of a bug in the replica? Should a canister take into account that a wasm instruction like 2+2 can come out to equal 5 because of a bug in the replica? Is it dangerous to rely on a 2+2 operation being equal to 4? Is it dangerous to rely on the protocol delivering a call to the correct intended callee? There are many more samples of this. How far does it go?
-
A quick check of the ckBTC canisters looks like they bank/rely on this response delivery guarantee for the correctness of the ckBTC canisters’ code. This line here in the ckBTC-minter canister mints the ckBTC for the user based on the received btc-utxos and marks the utxos as minted once a successful response comes back. If the ckBTC ledger canister mints the ckBTC but then the minter-canister receives an error response back (or worse doesn’t receive a response at all), the minter-canister will not mark those utxos as minted and will mint them again next time the update_balance method is called. The minter is not using idempotency when calling the ckbtc-ledger. It can be great if we can get a confirmation by the ckBTC team if they wrote the canisters to rely on response delivery or if they intended to use multiple phase commits, response timeouts, and idempotent operations on every receiving and outgoing inter-canister call.