Scalability of update calls in a common scenario

Actually, there’s a more interesting case:

Suppose A sends “ping” messages to B and C, and then updates its state to reflect the order in which they reply. If I understand correctly, Paul stated elsewhere that all of this can happen inside a single block/batch, as seems necessary for even vaguely acceptable performance.

The order in which the messages are processed by B and C, and the order their replies are seen by A, and so the final state of A, all seem to be non-deterministic depending on scheduling quirks.

However if this computation is run by several independent DCs, they’ll produce different arbitrary orderings and so different results in A, which is not allowed.

This seems to mean that all participating DCs have to stay in lock-step not only for “external” messages coming in to the subnet, but also for the ordering of every single internal message. So every single message requires subnet-global consensus across multiple DCs.

Am I missing something? If this is the plan it seems really fatal for performance. Calls that would, on regular platforms, be sub-millisecond DC-local calls will take hundreds of ms.