I don’t see this working, honestly.
First, what are we locking, whole memory ranges or individual addresses? If the former, then a linked list might well be spread across the whole heap (some entries got allocated early, when very little heap was in use, some only recently, towards the end of the 4 GB heap). If the latter, then we would need a lock for every byte (or maybe word) of memory. You’ve just doubled memory usage and forced every single write to go though a consensus round (i.e. every memory write takes 1 second).
Second, you would also need to order memory reads, not just writes. And taking the “increment a counter” example, you don’t want to order all reads, then order all writes. If I repeatedly increment the counter, I need at the very least to put every “read + write” cycle behind a lock. But maybe I don’t want to increment the counter twice, read it back and find that its value has grown by 3. So then I have to hold on to the lock throughout the execution of my request. At this point I’m executing requests in sequence, with a lot of overhead (locking and unlocking every memory location I’m accessing) while keeping many cores idle, waiting to grab a lock, instead of executing transactions on other canisters.
Third, I don’t even know how I would write Rust (let alone Motoko) code that locks access to a given memory range. How do I figure out the memory range of a linked list or tree? How do I force the coder to lock every memory address it reads from? Some of those memory accesses may not even be obvious, because they’re done by library functions.
Fourth, with so many locks I’m virtually bound to deadlock the first time I run 2 transactions in parallel (because one thread will lock A then B, while the other will lock B then A and each will end up waiting for the other).
Fifth, every lock that a request fails to acquire would need to create a “call context” (similar to what we have now when making a canister call; stack and other data, probably locks held), so it can be resumed from that state whenever the lock can be acquired or the memory write is applied. That call context needs to be persisted, hashed, certified every single round. That is huge overhead.
AFAICT whatever this mechanism that allows multithreaded canisters is, it cannot include canister-controlled locking or it will be both inefficient and brittle. It would either need to be something fully automated, relying on orthogonal persistence (start the transactions in parallel and if they both touch the same memory address abort one of them (the “second”, whatever that means). Or involve different heaps (which is how heap vs. stable memory is currently implemented) and only allow transactions that touch different sets of heaps to run in parallel. But the former will likely always result in starting 2 transactions and aborting one. And the latter would require specialized language features as e.g. Rust doesn’t have a concept of multiple memory heaps.