how does ICP handle DDOS
Thanks @dsharifi! BTW is current pooling (V2) version of update mechanism going to be still supported in the future? In some cases I would prefer that, especially in Java server->ICP scenarios, where Java developers can control their connection and thread pools. Latency is not such a big issue there.
Thereās rate limiting and message validation for inter-canister calls, but Iām less clear on calls from outside the IC network. Hereās an interesting commit from last week that I just came across ā
^ Iām interested to see this commit at a time where update call performance is being significantly improved. Can I ask what the rate limiting mechanism was prior to this @rbirkner?
Good catch!
What the commit does is to introduce a limit of 1000 update calls per second per subnet and boundary node.
The subnets have a setting in their subnet record, which is called max_ingress_messages_per_block
(see here). This translates to how many update calls can be put into a single block and is currently set to 1000 on all the subnets.
At the moment, we have in total around 20 boundary nodes and at least 4 boundary nodes in each region. That means even after introducing the rate limit, more than 20k update calls per second could make it to the subnet, which will be more than the subnet can handle.
The idea of this rate-limit is that it shouldnāt affect normal usage, but rejects update calls early if there are way too many of them. We have good monitoring and will of course adapt the rate-limits if we realize they are set too low and reject too much.
Initially, there were much lower rate-limits (see here and here), but they were removed around 1-2 years ago they werenāt considered necessary anymore. The couple of incidents with the high load, which we had during the beginning of the year, however showed that there is no point in ālettingā way too many update calls through and thatās why it got reintroduced.
interesting, it would be nice to build rate limiting features which leverage AI intelligence to figure out if the calls are DDOS versus valid calls because of an app going viral. I am confident these kind of tools will get built in the ICP ecosystem. Such a service can then be leveraged by canisters on a need basis and turned ON or OFF as required. a.k.a. DDOS prevention as a service provided by another canister, a kind of common good
I love the enthusiasm, but Iām not convinced of the practicality of a solution like that (Iām not sure what blackbox AI would be bringing to the table over rigerous, predictable and efficient algorithms that are only as complex as they need to be).
Thanks @rbirkner, this was very informative and useful. Given that the block rate is being slashed in half (from 600ms to 300ms) this would mean that the max_ingress_messages_per_block
in relative terms is kind of doubled (not per block, but in terms of messages that can be written to a block per second).
Can I ask for some more info about the rationale for having it set to 1000 (i.e. why itās not lower or higher)? Iām asking mostly out of curiosity
As long as there is a use case for the V2 endpoint, we will continue to maintain it.
However, I would note that the V3 endpoint should be able to serve all the use cases one client might have. For example, if you want to batch a large number of update calls to be submitted, there is little overhead for the user client to use the V3 endpoint. The boundary nodes that the client connects to support HTTP/2 which means that clients can concurrently send multiple requests on the same TCP connection to a single BN.
Update:
The proposal to change the initial_notary_delay_millis
for the first subnet, io67a-2jmkw-zup3h-snbwi-g6a5n-rm5dn-b6png-lvdpl-nqnto-yih6l-gqe,
subnet has been executed, and we are now observing a stable block rate of 2.3 blocks/s!
Thanks Daniel. When I have our V3 implementation ready I can try to test throughput with x number of parallel threads from single JVM to see if there is any difference between V2 and V3. For many applications I would rather sacrifice latency over throughput. It would be good document for Java ICP developers to choose right approach. I will shared our results.
This is interestingā¦the state can be read by anyone? Is this true for all requests, or just anonymous requests?
Are the request IDs at least non-deterministic?(If so, I guess boundary nodes could still publish them, but theyād at least be harder to find.)
My understanding is that it is authorized by principal, so only anonymous requests for this specific state, the result of an ingress message update call (maybe also query call in replicated mode).
The networking team has run extensive tests and they show that the idle block rate improves significantly (basically what you observed after the proposal passed). However, as the load increases, the improvements diminish. Thatās why we are currently sticking to the 1000 update calls per second per boundary node. We are monitoring the situation and if we see that this limit is too low, we will increase it of course.
Hey @dsharifi, great to see the block rateās been holding steady on this canary subnet, and itās exciting to see open proposals to make this change on other subnets now.
Itās interesting that the transactions per second shot up at the time of the subnet config upgrade, and held steady for a bit before shooting back down to typical levels. Are you able to provide some commentary on whatās expected to have happened here (what caused a sustained high throughput of transactions before returning to mostly typical levels)?
Iāve put a bit of information together about each of these new proposals and collated them on this post, with a few more observations and questions ā Subnet Management - General Discussion - Governance / NNS proposal discussions - Internet Computer Developer Forum (dfinity.org)
On a side note...
@dsharifi, would you consider providing a link to the relevant forum discussion in proposal summaries in the future? This is super useful as it allows voters in the NNS dapp to know where to turn for critical discussion and debate relating to the proposals (allowing them to cast a more considered vote).
As an example, āChange Subnet Membershipā proposals have recently started providing a link in the proposal summary to the relevant subnet thread on this forum (thanks again @Sat), e.g. 132225, 132228. :)<
Update Iāve revisited via the dashboard and noticed that the URL field is provided and links to this topic (I was initially looking for this in the proposal summary). It doesnāt look like the NNS dapp renders this URL when viewed on a mobile device (but this is arguably one of the most important pieces of information for voters who wish to make an informed decision).
Update:
NNS proposals to increase the block rate of the following subnets were executed a few hours ago. We now observe the following block rates:
not blackbox AI, but fully audited and codereviewed AI
Any TAT bring v3 to other subnets?
where can I find the docs on how to make the call to the v3/.../call
endpoint?
I attempted to implement the new endpoint by simply changing v2
to v3
and it results in a 404 not found
error.
edit: I just noticed this post
looks like the v3
route isnāt ready just yet. @dsharifi do you have an estimate on when the new endpoint will be ready for production?