Traffic was already considerable before 20:00 UTC. Some people got early access to claim their tokens (before it was open to the public, I believe there were 2 previous rounds) and they (plus some eager regular users) were already putting significant load onto the canister before 20:00 UTC. At some point that went over the configured limit, whatever that was, and requests started being dropped.
The boundary node rate limit (which I don’t actually know, but I suppose is something like 1K QPS per subnet) is static. Whatever static limit you set there is bound to be inaccurate, as not all queries (or updates) are equal. E.g. in this case queries were rather lightweight (under 10K Wasm instructions each). Earlier this week and the week before we were seeing queries that were 4-5 orders of magnitude heavier (as in, up to 1-2B instructions, nearly 1 second of CPU time).
Whatever static limit you put in place, it is not going to cover both those extremes. What’s necessary is a feedback loop allowing boundary nodes to limit traffic based on actual replica load.
It took us a couple of days to piece together exactly what happened from metrics and logs. There was no action that could be taken ahead of time, particularly with no insight into how the respective dapp was put together (how many canisters, where, etc.).
A subnet is essentially a (replicated) VM. A canister is (for the purposes of this analysis) a single-threaded process as far as transactions are concerned; and a distributed service for read-only queries.
So a canister can execute (almost) as many transactions as can run sequentially on a single CPU core.
And ~1K read-only QPS (which is what the subnet was handling according to the replica HTTP metrics) is entirely reasonable if you consider (a) that there was no HTTP caching in place (every single query needed to execute canister code) and (b) single-threaded execution (including for queries) was in place on this subnet only, as a temporary mitigation for a different issue (high contention in the signal handler-based orthogonal persistence implementation).
And again, both query and transaction throughput depends on how much processing each requires: returning 1KB of static content is very different from scanning 100 MB of heap to compute a huge response. In that context “15k reqs/sec” is somewhat meaningless.
A few of the things I can think of: query response caching (via HTTP expiration headers), spreading load across multiple canisters and/or subnets, single-canister load testing to see what a realistic expected throughput (based on instruction limits) should be. Some of these would benefit from additional tooling or protocol support, most can already be put in practice.
Queries and query traffic are (for now) free. A canister executing transactions on a single thread can only burn so many cycles within half an hour (which is how long the whole thing took before there was no more need to execute any transactions) even at full tilt.
I hope this answers your questions.