Reducing End to End latencies on the Internet Computer

how does ICP handle DDOS

Thanks @dsharifi! BTW is current pooling (V2) version of update mechanism going to be still supported in the future? In some cases I would prefer that, especially in Java server->ICP scenarios, where Java developers can control their connection and thread pools. Latency is not such a big issue there.

Thereā€™s rate limiting and message validation for inter-canister calls, but Iā€™m less clear on calls from outside the IC network. Hereā€™s an interesting commit from last week that I just came across ā†’

chore: add a per-boundary-node rate-limit of 1000 update calls per seā€¦ Ā· dfinity/ic@4039ea2 (github.com)

^ Iā€™m interested to see this commit at a time where update call performance is being significantly improved. Can I ask what the rate limiting mechanism was prior to this @rbirkner?

1 Like

Good catch!

What the commit does is to introduce a limit of 1000 update calls per second per subnet and boundary node.

The subnets have a setting in their subnet record, which is called max_ingress_messages_per_block (see here). This translates to how many update calls can be put into a single block and is currently set to 1000 on all the subnets.

At the moment, we have in total around 20 boundary nodes and at least 4 boundary nodes in each region. That means even after introducing the rate limit, more than 20k update calls per second could make it to the subnet, which will be more than the subnet can handle.

The idea of this rate-limit is that it shouldnā€™t affect normal usage, but rejects update calls early if there are way too many of them. We have good monitoring and will of course adapt the rate-limits if we realize they are set too low and reject too much.

Initially, there were much lower rate-limits (see here and here), but they were removed around 1-2 years ago they werenā€™t considered necessary anymore. The couple of incidents with the high load, which we had during the beginning of the year, however showed that there is no point in ā€œlettingā€ way too many update calls through and thatā€™s why it got reintroduced.

5 Likes

interesting, it would be nice to build rate limiting features which leverage AI intelligence to figure out if the calls are DDOS versus valid calls because of an app going viral. I am confident these kind of tools will get built in the ICP ecosystem. Such a service can then be leveraged by canisters on a need basis and turned ON or OFF as required. a.k.a. DDOS prevention as a service provided by another canister, a kind of common good

3 Likes

I love the enthusiasm, but Iā€™m not convinced of the practicality of a solution like that (Iā€™m not sure what blackbox AI would be bringing to the table over rigerous, predictable and efficient algorithms that are only as complex as they need to be).

2 Likes

Thanks @rbirkner, this was very informative and useful. Given that the block rate is being slashed in half (from 600ms to 300ms) this would mean that the max_ingress_messages_per_block in relative terms is kind of doubled (not per block, but in terms of messages that can be written to a block per second).

Can I ask for some more info about the rationale for having it set to 1000 (i.e. why itā€™s not lower or higher)? Iā€™m asking mostly out of curiosity :slight_smile:

1 Like

As long as there is a use case for the V2 endpoint, we will continue to maintain it.

However, I would note that the V3 endpoint should be able to serve all the use cases one client might have. For example, if you want to batch a large number of update calls to be submitted, there is little overhead for the user client to use the V3 endpoint. The boundary nodes that the client connects to support HTTP/2 which means that clients can concurrently send multiple requests on the same TCP connection to a single BN.

2 Likes

Update:

The proposal to change the initial_notary_delay_millis for the first subnet, io67a-2jmkw-zup3h-snbwi-g6a5n-rm5dn-b6png-lvdpl-nqnto-yih6l-gqe, subnet has been executed, and we are now observing a stable block rate of 2.3 blocks/s!

4 Likes

Amazing work :clap:

I love being able to observe these metrics and see the difference this has made :heart_eyes:

2 Likes

Thanks Daniel. When I have our V3 implementation ready I can try to test throughput with x number of parallel threads from single JVM to see if there is any difference between V2 and V3. For many applications I would rather sacrifice latency over throughput. It would be good document for Java ICP developers to choose right approach. I will shared our results.

1 Like

This is interestingā€¦the state can be read by anyone? Is this true for all requests, or just anonymous requests?

Are the request IDs at least non-deterministic?(If so, I guess boundary nodes could still publish them, but theyā€™d at least be harder to find.)

My understanding is that it is authorized by principal, so only anonymous requests for this specific state, the result of an ingress message update call (maybe also query call in replicated mode).

1 Like

The networking team has run extensive tests and they show that the idle block rate improves significantly (basically what you observed after the proposal passed). However, as the load increases, the improvements diminish. Thatā€™s why we are currently sticking to the 1000 update calls per second per boundary node. We are monitoring the situation and if we see that this limit is too low, we will increase it of course.

7 Likes

Hey @dsharifi, great to see the block rateā€™s been holding steady on this canary subnet, and itā€™s exciting to see open proposals to make this change on other subnets now.

Itā€™s interesting that the transactions per second shot up at the time of the subnet config upgrade, and held steady for a bit before shooting back down to typical levels. Are you able to provide some commentary on whatā€™s expected to have happened here (what caused a sustained high throughput of transactions before returning to mostly typical levels)?

2 Likes

Iā€™ve put a bit of information together about each of these new proposals and collated them on this post, with a few more observations and questions ā†’ Subnet Management - General Discussion - Governance / NNS proposal discussions - Internet Computer Developer Forum (dfinity.org)

On a side note...

@dsharifi, would you consider providing a link to the relevant forum discussion in proposal summaries in the future? :pray: This is super useful as it allows voters in the NNS dapp to know where to turn for critical discussion and debate relating to the proposals (allowing them to cast a more considered vote).

As an example, ā€˜Change Subnet Membershipā€™ proposals have recently started providing a link in the proposal summary to the relevant subnet thread on this forum (thanks again @Sat), e.g. 132225, 132228. :)<

Update Iā€™ve revisited via the dashboard and noticed that the URL field is provided and links to this topic :+1: (I was initially looking for this in the proposal summary). It doesnā€™t look like the NNS dapp renders this URL when viewed on a mobile device (but this is arguably one of the most important pieces of information for voters who wish to make an informed decision).

Update:
NNS proposals to increase the block rate of the following subnets were executed a few hours ago. We now observe the following block rates:

  • bkfrj ā†’ 2.57 Blocks/s
  • fuqsr ā†’ 2.49 Blocks/s
  • ejbmu ā†’ 2.45 Blocks/s
  • csyj4 ā†’ 2.43 Blocks/s
  • 6pbhf ā†’ 2.10 Blocks/s
10 Likes

not blackbox AI, but fully audited and codereviewed AI

Any TAT bring v3 to other subnets?

where can I find the docs on how to make the call to the v3/.../call endpoint?
I attempted to implement the new endpoint by simply changing v2 to v3 and it results in a 404 not found error.

edit: I just noticed this post

looks like the v3 route isnā€™t ready just yet. @dsharifi do you have an estimate on when the new endpoint will be ready for production?