In several of the developer discord community calls there have been meandering discussions around how to guard canisters and canister methods against denial of service attacks. These discussions have resulted in no clearly defined solutions, and so hopefully bringing this concern to the forums will accelerate the conversation.
While we have inspect_message and could guard a specific canister against an attack by having it only accept messages from a single principal, public canisters or canisters that accept all authenticated principals would still be susceptible to a denial of service attack if throttling is not configurable on a per principal/anonymous level through the boundary nodes or at a high enough level of abstraction that protects the individual canister.
By looking at the Internet Identity repo “Architecture Overview” section of the README, the Internet Identity currently resides on a single canister, which means that canister is a single point of failure for applications on the IC, and therefore the Internet Identity as a service as a whole is susceptible to being overwhelmed by a denial of service attack.
What is the Internet Identity application currently doing to safeguard against denial of service attacks?
If the app already guards against denial of service, how is it accomplishing this, and what can other developers learn from this approach in terms of designing single canisters to be resilient against denial of service attacks?
My understanding is that one of the promises of the Internet Computer is that app developer’s don’t have to worry about all these issues of what was called the conventional “shit stack”, and that the Internet Computer itself takes care of that (e.g. on the boundary nodes). How well that works I cannot say, though.
@nomeata is (as often) correct. Basically the Boundary Nodes (the actual computers you connect to and that turn HTTP calls into canister calls) generally rate limit requests – I am guessing IP-based but not 100% sure.
Moreover, the Boundary Nodes include a hack special bit of code for Internet Identity, that rate-limits anchor creation even more (limiting the number of anchor created to N per hour). I’ll let some BN specialist expand on that if necessary.
Makes sense, so for the most part we’d have some protection against a single machine/IP attempting to lock down a canister.
Got it, that makes sense in terms of anchor creation - applications can also limit their user creation/per hour.
Moving to not a single IP attacker, but a coordinated traffic event or attack - since there are already hundreds of thousands of identities, what would happen if:
~100K identities attempted to authenticate/login (via II) to a single IC application on the mainnet at the same time?
This wasn’t just a spike in usage, but an attack and each of those identities tried to log into 50-100 applications on the IC at the same time?
Now the attacker has figured out an automated solution for passing captcha and has slowly been amassing internet identities, such that the same attack as (2) happens, but with 1 million identities, or 10 million identities.
TLDR: Application canisters are protected by subnet level rate limits. Rate-limiting beyond the subnet level defaults has to be done by the canister itself. We are looking into providing canisters with all the info needed to make these rate-limiting decisions locally.
The boundary node infra is responsible for rate-limiting requests for Internet Computers at a subnet level. The boundary node is oblivious to canisters specific and imposes rate limits in a general sense (reasonable defaults). These defaults will change as the platform matures AND/OR when we are under constrained conditions like a DDOS attack. Current defaults are
Per subnet / Per boundary node
100 Request/second per IP. (irrespective of request type)
If we have N boundary nodes the overall upper bound to queries/updates seen by a subnet is 500 * N & 50 * N Rps
Q. how does Internet Identity (II) handle DDOS?
II did face spam attacks in the past. To thwart this spam, additional rate limits were placed specifically for the II canister such that only 1 II create request was allowed per minute per IP. This was possible only because we knew the mechanics of the II which helped in decoding the CBOR and applying the rate limits.
Please note, that the II-like special handling cannot be extended for general canisters. In all likelihood, boundary rate limits are to be relaxed in the future.
Very interesting - does the custom rate limiting by IP have any considerable impact on the performance of logins or query/update calls for individual users?
Also - how difficult would it be to expose rate limiting on an IP/Principal basis for IC developers?
I know inspect_message is currently being implemented for Motoko, but it would be awesome if this rate limiting by IP/Principal functionality could be exposed/integrated with the existing inspect_message API as an optional field.
It looks right now like inspect_message is being exposed as a system function - the in progress PR by @claudio can be found here.
Would be great if DFINITY could take these learnings from the II canister, generalize them, and talk to the Rust/Motoko CDK teams to find the best language abstractions for exposing this functionality to developers.
I would imagine that for DeFi and certain applications where reliability and high availability are assumed, this is pretty high up on their list of priorities.
Just curious: do replicas perform any rate-limiting? Boundary nodes only handle ingress messages, but what if a single canister makes an excessive number of inter-canister calls within a subnet?
Hey, flow control on the Internet computer is an involved topic. I would try and be brief here but this is a good candidate for a blog entry.
Replica and the networking connecting the replicas operate at disparate processing speeds. At any given time some replicas may be faster/slower than the others (asynchronous nature of the network). IC needs to employ intelligent queuing/buffering to keep such nodes in sync in spite of the asynchronous nature of the underlying network. Flow control for the IC broadly has to take care of.
Cross canister message part of the execution.
Ingress message flow control: Leaky bucket, where messages that don’t fit in the queue are dropped. This is fine as the user agent can retry the message. Fairness and thus no starvation are goals here. Once an ingress makes it into the replicas ingress pool, it can still be dropped as it can expire before it is picked up by the block maker for consensus/execution. This is the only layer where we can tolerate message drops.
Consensus: The P2P layer implements fixed-sized i/o queues per replica. So malicious replicas cannot overwhelm other replicas. These queues are sized (size and count of messages) sufficiently well such that replicas operating within the protocol limits will never run out of queue space. I a replica genuinely falls behind (transient sluggish network) it can ask for retransmission to do a quick resync. If a part of the subnet is lagging behind for longer periods of time - it will fail to do quick resyncs and eventually fall back to CUP-based state-syncs. This queuing model works on the premise the CUP-based state syncs are orders of magnitude faster than actual consensus-based progress.
Message routing: Flow control in message routing is similar to P2P routing in essence. XNet messaging employs Stream Transmission Protocol (look up this description by David Derler). It’s very close to TCP ordered message delivery. Again Per canister per subnet fixed-size, message routing queues make sure that no single canister/subnet is overwhelming the IC network bandwidth. The goals in messaging routing are: Guaranteed, ordered delivery of messages across canisters. Since there is the promise of guaranteed delivery - something has to give in if the processing slows down. So if a canister makes exorbitantly high cross net messages its execution may slow down as queues back up.
This is a high-level description. Flow control is a well-researched academic topic, and the IC overall does a great job at it (wrtx to another blockchain tech out there.) Cross net messaging and subnets are one of the key innovations of the IC.
Thanks @icme for starting this topic, I meant to write down some ideas on this topic for a while now, but I kept delaying it. Here’s a list that I had in my notes, some of the things have already been discussed in this topic.
authentication vs. authorization - the IC provides the first (e.g. principals) but the canister needs to implement the second.
“dumb” DDoS vs. “smart” DoS - the IC provides some help for the first, with the exception of inter-canister calls. (e.g. a sufficiently well funded attacker could ddos any service). There are currently no libs for handling “smart” DoS.
smart DoS → analyze interfaces, create as much work with as few requests.
(ideas for lib)
4. inspect_message based ACL for web-based attacks. Should cover 90+% of low effort attacks (i.e. bots & co). Note that inspect_message does not work for inter-canister calls.
5. ACL at the entrypoint of every update to double-down on inspect_message ACL.
6. how important is it to trap early? Conditional ACL based on “states” and logging libs? (e.g. only ACL when canister is in “heavy_traffic” state, as defined by some logging metrics) Need some stats.
7. advanced authorization patterns - use spawned canisters to provide on-boarding, have the main canister only “talk” to authorized principals. Has anyone red-teamed this? How many messages is enough to “clog” a canister?
The numbers provided by faraz are interesting. It would be nice if someone running heavy traffic canisters in prod can publish some numbers on call rates, attack traces, and possible mitigations. We could use those to begin writing up some core concepts and start working towards a “better than nothing” ACL lib. The fact that inspec_message is only used for ingress messages is concerning, and most likely all the ACL logic will have to be doubled in every update call, but it’s better to start somewhere, and at least have protection against some attacks.
edit1: @Seb mentioned during the Friday call that there is some limit on inter-canister messages? Can you please expand on that? Any numbers or sources on this?