Austin is right. A subnet’s output (whether in the form of responses to ingress messages or canister to canister messages) is mostly limited by compute (IIRC you can only output a couple of GB per round before you run out of instructions).
You are also right about the block size. It is currently limited to 4 MB and this is because before you even get to 4 MB you run into limitations with the consensus algorithm and the speed of light: you simply cannot transmit a lot more than that around the world over the public internet with sub-second roundtrip latency (particularly considering that you need another couple of network roundtrips per consensus round). Meaning that a subnet’s ingress (not egress) including ingress messages, canister-to-canister messages, HTTP outcalls is limited to 4 MB.
The Consensus team is actively working on only putting references to messages into blocks, so that (i) block size and latency is reduced and (ii) the referenced payloads can be arbitrarily large. For now, they’re focusing on (i) (and only on ingress messages) with payload size still limited to 4 MB per block, only the block size and latency significantly reduced. Once they iron out all the rough edges, they will start looking into increasing the payload size; and applying the same approach to other payload types (XNet messages, HTTP outcalls).
To answer your last couple of questions, you can max out your throughput the usual way: make concurrent calls instead of waiting for a roundtrip to complete; this happens implicitly if your calls are each triggered by different incoming calls; otherwise you need to issue the calls all at once and join await all of them.
The current maximum throughput is 4 MB times the number of subnets. But there is no other limit beyond a subnet’s ingress bandwidth and the internet’s / the specific data centers’ network bandwidth.