I have several questions around the actor model, data structures and the functional programming aspects of the IC

I’ve been trying to fully comprehend how the IC works while learning Motoko. I still don‘t fully get a few basic parts though.

I do not have a formal computer science background, my apologies in case the questions are obvious to others.

Actor model vs Immutable variables

As far as I understand immutable variables allow programs to be executed concurrently as different functions cannot refer to the same state and thus prevent unintended mutations. There can still be the case where one process refers to an old state, however, this seems to be an easier problem to solve (if fully functional, the process can simply run again when realizing that there is a new state, correct?). So the IC is using immutable data structures as well as the actor model for communication between canisters. I don’t understand which problem the actor model is exactly solving and which problem immutable data structures are solving, and why both are needed?
In this talk: Community Conversations | Overview of Motoko, the Native Language of the IC - YouTube, Andreas Rossberg mentioned that the actor model is helping to prevent deadlocks and race conditions but isn‘t that already what the immutable data structures are preventing? It all makes intuitive sense but I’d like to really get it.

Messages

What are the messages between actors exactly? Where are they stored and what is the structure of them?

Asynchronicity vs Synchronicity

Generally, I also don‘t fully understand the debate around the synchronous vs asynchronous design of blockchains. Not sure what to ask here, I’m simply looking for a rough explanation on this, especially in relation to having subnets and the actor model on the canister level.

Functional programming

I would like to understand better when/if people would use immutable data structures “within” canisters/actors?

Stable Data Structures

Why can only immutable data structures be stable? Andreas Rossberg was mentioning that the structure of memory can change for mutable structures, what was meant by that? This one is certainly a lack of basic understanding on my side on how memory works…

Thank you very much for any help.

2 Likes

Bumping this because I’d love to hear answers too!

1 Like

I have some answers but I think the Motoko team (who know actor model very well) would be best.

so let ping them. Sorry I didn’t see this before!

2 Likes

Actor model vs Immutable variables

As far as I understand immutable variables allow programs to be executed concurrently as different functions cannot refer to the same state and thus prevent unintended mutations. There can still be the case where one process refers to an old state, however, this seems to be an easier problem to solve (if fully functional, the process can simply run again when realizing that there is a new state, correct?). So the IC is using immutable data structures as well as the actor model for communication between canisters. I don’t understand which problem the actor model is exactly solving and which problem immutable data structures are solving, and why both are needed?
In this talk: Community Conversations | Overview of Motoko, the Native Language of the IC - YouTube, Andreas Rossberg mentioned that the actor model is helping to prevent deadlocks and race conditions but isn‘t that already what the immutable data structures are preventing? It all makes intuitive sense but I’d like to really get it.

The actor model is a way to scale concurrent programming beyond the confines of single machine to a distributed setting where shared memory is too expensive to implement. Since actors (like separate machines) can only communicate by message passing, there is no need to implement the abstraction of shared memory which can be very difficult to provide across multiple machines. If messages could transmit mutable data, then a sender would expect to be able to observe changes to data made by receivers, and vice versa, re-introducing the need for shared memory. By restricting message payloads to immutable data, the data can be sent by just copying it, without having to preserve the identity of its mutable parts (because their aren’t any).

Maybe this helps: Actors and async data :: Internet Computer

Messages

What are the messages between actors exactly? Where are they stored and what is the structure of them?

At the level of the IC, messages are just binary data (byte sequences) that are stored on the chain up until the beginning of the last epoch - the IC, unlike other block chains, doesn’t not store all of its history since genesis. Roughly speaking a message will contain the address/principal of the receiver, the desired target method name and the binary payload of the method together with some record of the call context awaiting a response to that message.

Motoko and other languages that communicate via Candid (e.g. Rust canisters when compiled with the cdk), interpret that binary data as typed, immutable Candid values. Candid provides a high-level serialization/deserialization format for typed data that is transmitted, by the IC, as raw, untyped binary data, between canisters.

Asynchronicity vs Synchronicity

Generally, I also don‘t fully understand the debate around the synchronous vs asynchronous design of blockchains. Not sure what to ask here, I’m simply looking for a rough explanation on this, especially in relation to having subnets and the actor model on the canister level.

I’m not sure about blockchains, but for general concurrency and distributed programming I can say this:

Roughly speaking, asynchronous messaging scales better to a distributed setting because the sender of a message need not block waiting for the response to a message before doing other things. This helps to hide the latency of communication (which is high for the IC and blockchains in general) by allowing the sender to initiate and await the responses to several messages in parallel.

Functional programming

I would like to understand better when/if people would use immutable data structures “within” canisters/actors?

Both immutable and immutable data structures have their uses within canisters. Immutable ones tend to be easier to reason about can be an advantage when correctness is a priority or when the ability to revert to a previous version of a data-structure is useful. Mutable data structures can be more time and space efficient. Like other imperative functional languages such as ML and Scheme, Motoko supports both.

Stable Data Structures

Why can only immutable data structures be stable? Andreas Rossberg was mentioning that the structure of memory can change for mutable structures, what was meant by that? This one is certainly a lack of basic understanding on my side on how memory works…

Actually, that’s not the case for Motoko - perhaps Andreas was referring to something else.

In Motoko, stable types extend (immutable) shared types to additionally allow mutable arrays and mutable record fields. Like shared types these can’t contain (local) function values or objects, for the simple reason that the code referenced by a function value or object method will not be available or even be meaningful after the actor has been upgraded.

However, it’s true that shared data (stuff that can be sent in message), cannot be mutable, so that the (encoding of) that data can simply and easily be copied between actors without having to preserve the identity of any mutable parts (since there aren’t any) (see above).

5 Likes

Since actors (like separate machines) can only communicate by message passing, there is no need to implement the abstraction of shared memory which can be very difficult to provide across multiple machines.

I’d be interested in what message passing system actually does support mutable messages via shared memory (in a distributed setting). AFAIK commonly used systems like gRPC + Protobuf also don’t support this, although I could be wrong.

Generally, I also don‘t fully understand the debate around the synchronous vs asynchronous design of blockchains. Not sure what to ask here, I’m simply looking for a rough explanation on this, especially in relation to having subnets and the actor model on the canister level.

A really good resource I’ve found is The Internet Computer for Geeks whitepaper that was released recently. Section 1.4 is especially pertinent.

Notably, it uses the words “synchronous” and “asynchronous” to describe models of communication (as opposed to I/O operations in programming). Blockchains that assume asynchronous models of communication are more powerful than those that assume synchronous models, because in an asynchronous model an adversary can delay message delivery between replicas/nodes by any finite amount of time. The whitepaper claims that the IC assumes a partially synchronous model, whereas a PoW blockchain like Bitcoin assumes a fully synchronous model (thus the IC is more “powerful”). Really interesting stuff…

2 Likes

I’d be interested in what message passing system actually does support mutable messages via shared memory (in a distributed setting). AFAIK commonly used systems like gRPC + Protobuf also don’t support this, although I could be wrong.

I’m not aware of any messaging passing systems that support mutable data (apart from non-distributed, shared-memory ones), but was thinking about systems like FaRM https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf for implementations of distributed shared memory.

Sorry for the late reply. Thank you very much for the very helpful answer!! That makes a lot of sense :slight_smile:

1 Like

Thank you very much!! Based on your reply I stumbled on the following article, which is quite good at explaining it in more detail (in case someone else is interested): Synchrony and Timing Assumptions in Consensus Algorithms Used in Proof of Stake Blockchains | by zk - Zubin Koticha | Mechanism Labs | Medium

Do I understand correctly that the idea is to evaluate under which conditions the consensus mechanism keeps adding valid blocks? So it’s not about a malicious actor double-spending or anything like that but simply about under which conditions the consensus algorithm works as intended.
Is that correct?