Actor Model fundamentals compromised?

That’s the thing. Actor Systems by definition are distributed and message driven, hence, do not have a call stack concept in the conventional sense. We are offered to think of Motoko actors in the context of Actor Model.

Shared Methods are types of communications accepted.
Return type is immutable which in the definition of serializable message that goes over the network.

I keep reasoning in terms of Actor Model and come to the point where it deviates - I need to be responsible for synchronization.

Want to understand the reason for that.
Hard to implement? Will be offered in the future?

Thank you guys. Great discussion

C.J

1 Like

Are you a software engineer? Virtually all programming languages are based on call stacks. Even Erlang has function calls. There is seriously no problem here. After playing around with Motoko for a while, the architecture should start making sense.

(Edit: removed harsh language.)

Can you define what do you mean by synchronization? And how is this problem solved in other lang, e.g. elixir? Just wanna make sure we’re on the same page to begin with.

@Nick, Yes I do write code and not just that for the last 20 years. For the last 7 - distributed systems is my interest.

Message based systems have traceable footprint but not the call stack since they involve multiple processes that distributed in time and space.

And the question is not about language but rather about conceptual paradigm that we are offered.

My questioning of the paradigm, which I am familiar with should not deserve comments that were edited after.

I think we can manage fruitful conversation, can we?

Appreciate it

C.J

1 Like

Synchronization of the actor state that documentation is referring to to make sure that while it awaits the state has not been changed.

Between suspension and resumption around the await , the state of the enclosing actor may change due to concurrent processing of other incoming actor messages. It is the programmer’s responsibility to guard against non-synchronized state changes.

Meaning that while the current message being processed it is possible for the state to be changed by other shared function - which is by definition is the processing of another message - while the previous has not been finished.

1 Like

Given your experience I guess it’s more efficient to point you to this resource:

Especially the Abstract Behavior section. There’s more details hidden there.

1 Like

It was also to my surprise when I first come across such property of motoko. After reading that doc I kinda have more understanding of what they’re really saying.

I cannot claim I understand everything (and I surely hope core team share more on the topic, cc @nomeata). To me the key take away is, message has a queue property, value of which is either Unordered or { from: A, to: B }, both A, B is principal of canisters.

The only guarantee of message ordering is that, two or more messages with same queue: { from, to } (that is between the same pair of canisters) are processed first-come-first-serve.

To me this sounds roughly a guarantee of non-surprising behavior in everyday use case.

Say canister A has only one foo public function, inside which it calls canister B and await multiple times. Now Alice and Bob both call A.foo, Alice comes first.

Although canister A will not block until finished handling Alice’s message (it will be able to serve Bob after encountering first await keyword) the above ordering guarantee ensures that the final result is effective the same as if canister A were blocking.

I think the idea here is that each actor-message is atomic & in the synchronization, but in the motoko language, the await keyword finishes that webassembly actor-message and creates a new message for the await call, using function-passing-callbacks.

2 Likes

You are probably all right, and the dissonance disappears when we notice that we have two layers of abstraction at play here.

The low layer is that of the actor model: Actors (or canisters) exchange messages, and we have all the nice atomicity guarantees we expect. This layer is also visible in the Internet Computer system: It’s the stuff that conensus, messaging, scheduling etc. is about.

On top of this layer, we built higher-level concept, and introduce the concept of a call. In particular, at this level we distinguish between messages that initiate calls, and those that respond to calls. This layer is implemented in the execution engine (e.g. to guarantee that every call will receive exactly one response, keeping track of outstanding responses). The atomicity guarantees of the lower layer still only apply to each message execution.

In that sense, Motoko actors are still actors, but what kind of messages they exchange when is additionally restricted by the concept of calls and call contexts. And, mostly for convenience, the await syntax allows you to program in this model a bit more conveniently than if you had to handle all responses via top-level functions as well (which would be the alternative, and actually has some benefits, e.g. the ability to upgrade while calls are in flight). But it’s just convenience, and as such obscures some of things happen underneath this layer, and sometimes that is dangerous.

So it is true that a serious developer on our platform will have to understand both layers, and understand which guarantees are provided where, and how the language and CDK they are using map to these concepts.

12 Likes

Great rundown. How does Rust compare to Motoko in this context?

Does Rust have the bility for the “upgrade while the call is in the flight” ?

@nomeata would a compelling minimal example of this behavior for Motoko be something like the following?

// Within the context of an `actor`
// which keeps some state that
// includes a `balance`
if balance == 0 {
  await ...;
  // Here we can’t assume that `balance`
  // is still `0` since other messages may
  // have mutated it in the meantime.
};
2 Likes

This is my point exactly. In the canonical Actor Model there is a place for a not-blocking call construct. However, it comes with a condition of being able to calculate the replacement behaviour which is absolutely isolated by definition.

In this case, however, the state is leaked/shared. Provided of course that I have clarity in understanding how Motoko deals with it.

C.J

I’m fluent in JS, and have some experience with Solidity and Elixir.

From JS POV, this behavior is very normal. Let’s map the logic into some concrete JS code:

class Actor {

  balance = 0

  async whenZeroBalance() {
    if (this.balance == 0) {
      await scrapeFund()
      console.log("current balance:", this.balance)
    }
  }

  // after 10sec, 1000 is magically added to balance.
  async scrapeFund() {
    await sleep(10000)
    this.balance += 1000
  }

  async topUp(amount) {
    this.balance += amount
  }
}

What happen if Alice calls whenZeroBalance() then Bob calls topUp(999)? We’ll see "current balance: 1999". This is intended, as we explicitly shared this.balance (@nerdoutcj yes, in this case it’s shared, you’re right) across multiple method calls. With this syntax we voluntarily share state.

However, we can opt to another model.

class Actor {
  _updateBalance(updater) {
    this.balance = updater(this.balance)
  }

  balance = 0
  
  async whenZeroBalance() {
    const balance = this.balance
    if (balance == 0) {
      const newBalance = await scrapeFund(balance)
      console.log("current balance:", newBalance)
      // sync state
      this._updateBalance((currBalance) => {
        if (currBalance == 0) {
          return newBalance
        } else {
          const delta = newBalance - balance
          return currBalance + delta
        }
      })
    }
  }
  
  // after 10sec, 1000 is magically added to balance.
  async scrapeFund(prevBalance) {
    await sleep(10000)
    return prevBalance + 1000
  }

  async topUp(amount) {
    this.balance += amount
  }
}

If we avoid referencing this.balance across await, and instead use the balance variable in closure, you don’t have to worry about it “surprisingly” changed after await.

What feels unnatural (surprising) to me at first sight of Motoko, is because of the mindset formed from past experience with Solidity. In Solidity each transaction is actually sync call, even across multiple contracts. Asynchrony doesn’t exit in Solidity’s model.

My thought about it is that, Motoko/Dfinity brings about a paradigm shift of computation model in blockchain, that’s a feature by design. And with power comes responsibility. We have concurrency and scaling ability, now we have to deal with state synchronization.

4 Likes

With my limited experience working with Elixir, my thought is Elixir being a FP lang without mutable data struct hides this kind of situation away. With immutable data, you’re force to pass things in closure, thus it’s simply impossible to get into race condition cus the current behavior is always referencing the latest closure. However, if a lang allow mutable data and async/await, then this situation is inherently inevitable.

(Elixir perhaps is unrelated to the topic, I’m just using it to illustrate my point since it’s the only actor model base lang I’m familiar with).

As nomeata pointed out, there are multiple layers at play here.

On one level, we have a simple form of actor (mapping to IC canisters). An actor communicates by message passing as you would expect, messages are received atomically, they can send an arbitrary number of follow-up messages without blocking on responses.

But in its raw form, this model is rather awkward to use in practice. If you have a sequence of messages and responses that you need to process in an interaction, then you’ll be forced to manually reify all the intermediate state (and control flow, including call chains) somehow and store it somewhere. This can be tedious and error-prone, and in the general case, amounts to a manual CPS-transformation of the entire program.

Async/await is best considered a layer of “syntactic sugar” on top of that, which does this transformation for you. An async function with awaits should not be thought of as the implementation of a single message, but an interaction involving a sequence of messages and responses, separated by await points.

And yes, this has the risk of luring you into a false sense of safety, where you forget that atomicity only extends to the next await. But that’s a trade-off that seems worth taking for the practical benefits.

In fact, we have been discussing ways in which the compiler or the type system could help detect erroneous state dependencies that cross await boundaries, and where the programmer would need to be explicit if they want to opt in into such a dependency. But that’s a tricky design space, and we do not have a satisfactory solution yet.

7 Likes

Yes! A perfect example.

It’s pretty much the same. The underlying execution model, as provided by the system, is the same, and the syntactic sugar in the form of async/await has the same properties.

Ah, upgrades-without-stopping … interesting topic, maybe worth it’s own thread. But here are some comments (and all of them apply to rust and motoko):

  • The system doesn’t stop you from upgrading a canister that isn’t “stopped”. But it isn’t always a good idea.

  • If you have a canister that never does outgoing calls, you can safely upgrade atomically without stopping, and have no downtime.

  • If you have a canister that you know at the moment has no outgoing calls, you can safely upgrade atomically without stopping, and have no downtime.

    Maybe your service provides lots of useful functionality without doing calls on it own, and only rarely does something that requires an outgoing call. Then you could add application logic where you instruct the canister (not the system!) to stop doing outgoing calls, wait for all outstanding to come back, and then upgrade (atomically and without stopping), without impeding the main functionality of the service.

  • If you have a conventional canister (Motoko or Rust), and you might have outstanding calls, you really really should not upgrade without stopping first.

    It’s not just that you might lose the response, but when the response comes back, the way things are set up right now, it could arbitrarily corrupt your canister state.

    The technical reason is that responses are delivered by invoking a Wasm function identified by a WebAssembly function table index, and neither Rust nor Motoko give you control over the location of functions in the table. So the new version of your canister might have a completely unrelated, internal function in that slot.

  • Theoretically, you can write canisters that you can upgrade while the call is in flight, if you make sure that the functions handling the callbacks are at the same position in the table. For example if you write your canister by hand in wasm, or beef up your Rust toolchain, or maybe some clever post-processing.

    I don’t think anyone has done or tried that so far. But the system conceptually supports this.

    In this model, you wouldn’t be using async and await, though, but you would use top-level named functions as callback handlers, and implement your service closer to the actor model, or maybe closer to a state machine. After all, you do want to handle the responses in the upgraded version, so you need to be more explicit about the flow here.

  • We have plans (but not high priority, unfortunately) to change or extend the System Interface to remove the problem that you can corrupt your state if you get this wrong, by delivering callbacks to named exported functions of the module (separate from the public methods, though).

    With that in place, you’ll be able to use write canisters that can be upgraded instantaneous and autonomously with in-flight outgoing calls at last in Rust/C/etc, and – after a bit more language design – hopefully also as an opt-in in Motoko.

8 Likes

If I have a Rust canister which has it’s own principal as the controller and no outstanding messages. Can I safely call the “upgrade canister” function on the management canister as long as you don’t do anything with the result?

Excellent question: It depends on what exactly you mean with “doing something with the result” means.

If on the raw system API level you pass an invalid table index (maxint) as the callback, you should be safe. This is the hack I suggested in my blog post for how to do one-way calls, and how recent versions of Motoko implement it. I didn’t check how to achieve this with the rust CDK, it may need to be patched.

If you (or your CDK) does pass a possible valid table index, and the next version has a unrelated function there, it wouldn’t be safe.

1 Like

If I were to to this without the CDK should I just copy the code of fn call_raw_internal cdk-rs/call.rs at main · dfinity/cdk-rs · GitHub. And replace

ic0::call_new(
            callee.as_ptr() as i32,
            callee.len() as i32,
            method.as_ptr() as i32,
            method.len() as i32,
            callback as usize as i32,
            state_ptr as i32,
            callback as usize as i32,
            state_ptr as i32,
        );

with

ic0::call_new(
            callee.as_ptr() as i32,
            callee.len() as i32,
            method.as_ptr() as i32,
            method.len() as i32,
            i32::MAX,
            state_ptr as i32,
            i32::MAX,
            state_ptr as i32,
        );

?

1 Like

Essentially yes! You wouldn’t need a state_ptr (there is no callback handler to receive that pointer), so you can pass 0 here and maybe remove some code related to state_ptr.

1 Like