Reducing End to End latencies on the Internet Computer

linkme · August 24, 2024, 2:01pm

how does ICP handle DDOS

rdobrik · August 25, 2024, 8:42am

Thanks @dsharifi! BTW is current pooling (V2) version of update mechanism going to be still supported in the future? In some cases I would prefer that, especially in Java server->ICP scenarios, where Java developers can control their connection and thread pools. Latency is not such a big issue there.

Lorimer · August 25, 2024, 9:52am

There’s rate limiting and message validation for inter-canister calls, but I’m less clear on calls from outside the IC network. Here’s an interesting commit from last week that I just came across →

chore: add a per-boundary-node rate-limit of 1000 update calls per se… · dfinity/ic@4039ea2 (github.com)

^ I’m interested to see this commit at a time where update call performance is being significantly improved. Can I ask what the rate limiting mechanism was prior to this @rbirkner?

rbirkner · August 25, 2024, 8:18pm

Good catch!

What the commit does is to introduce a limit of 1000 update calls per second per subnet and boundary node.

The subnets have a setting in their subnet record, which is called max_ingress_messages_per_block (see here). This translates to how many update calls can be put into a single block and is currently set to 1000 on all the subnets.

At the moment, we have in total around 20 boundary nodes and at least 4 boundary nodes in each region. That means even after introducing the rate limit, more than 20k update calls per second could make it to the subnet, which will be more than the subnet can handle.

The idea of this rate-limit is that it shouldn’t affect normal usage, but rejects update calls early if there are way too many of them. We have good monitoring and will of course adapt the rate-limits if we realize they are set too low and reject too much.

Initially, there were much lower rate-limits (see here and here), but they were removed around 1-2 years ago they weren’t considered necessary anymore. The couple of incidents with the high load, which we had during the beginning of the year, however showed that there is no point in “letting” way too many update calls through and that’s why it got reintroduced.

linkme · August 26, 2024, 12:04am

interesting, it would be nice to build rate limiting features which leverage AI intelligence to figure out if the calls are DDOS versus valid calls because of an app going viral. I am confident these kind of tools will get built in the ICP ecosystem. Such a service can then be leveraged by canisters on a need basis and turned ON or OFF as required. a.k.a. DDOS prevention as a service provided by another canister, a kind of common good

Lorimer · August 26, 2024, 7:22am

I love the enthusiasm, but I’m not convinced of the practicality of a solution like that (I’m not sure what blackbox AI would be bringing to the table over rigerous, predictable and efficient algorithms that are only as complex as they need to be).

Lorimer · August 26, 2024, 7:34am

Thanks @rbirkner, this was very informative and useful. Given that the block rate is being slashed in half (from 600ms to 300ms) this would mean that the max_ingress_messages_per_block in relative terms is kind of doubled (not per block, but in terms of messages that can be written to a block per second).

Can I ask for some more info about the rationale for having it set to 1000 (i.e. why it’s not lower or higher)? I’m asking mostly out of curiosity

dsharifi · August 26, 2024, 1:28pm

As long as there is a use case for the V2 endpoint, we will continue to maintain it.

However, I would note that the V3 endpoint should be able to serve all the use cases one client might have. For example, if you want to batch a large number of update calls to be submitted, there is little overhead for the user client to use the V3 endpoint. The boundary nodes that the client connects to support HTTP/2 which means that clients can concurrently send multiple requests on the same TCP connection to a single BN.

dsharifi · August 26, 2024, 3:38pm

Update:

The proposal to change the initial_notary_delay_millis for the first subnet, io67a-2jmkw-zup3h-snbwi-g6a5n-rm5dn-b6png-lvdpl-nqnto-yih6l-gqe, subnet has been executed, and we are now observing a stable block rate of 2.3 blocks/s!

Lorimer · August 26, 2024, 4:21pm

Amazing work

I love being able to observe these metrics and see the difference this has made

rdobrik · August 26, 2024, 5:05pm

Thanks Daniel. When I have our V3 implementation ready I can try to test throughput with x number of parallel threads from single JVM to see if there is any difference between V2 and V3. For many applications I would rather sacrifice latency over throughput. It would be good document for Java ICP developers to choose right approach. I will shared our results.

skilesare · August 26, 2024, 8:49pm

This is interesting…the state can be read by anyone? Is this true for all requests, or just anonymous requests?

Are the request IDs at least non-deterministic?(If so, I guess boundary nodes could still publish them, but they’d at least be harder to find.)

lastmjs · August 26, 2024, 10:05pm

My understanding is that it is authorized by principal, so only anonymous requests for this specific state, the result of an ingress message update call (maybe also query call in replicated mode).

rbirkner · August 27, 2024, 7:05am

The networking team has run extensive tests and they show that the idle block rate improves significantly (basically what you observed after the proposal passed). However, as the load increases, the improvements diminish. That’s why we are currently sticking to the 1000 update calls per second per boundary node. We are monitoring the situation and if we see that this limit is too low, we will increase it of course.

Lorimer · August 30, 2024, 5:24pm

Hey @dsharifi, great to see the block rate’s been holding steady on this canary subnet, and it’s exciting to see open proposals to make this change on other subnets now.

It’s interesting that the transactions per second shot up at the time of the subnet config upgrade, and held steady for a bit before shooting back down to typical levels. Are you able to provide some commentary on what’s expected to have happened here (what caused a sustained high throughput of transactions before returning to mostly typical levels)?

Lorimer · August 31, 2024, 10:06am

I’ve put a bit of information together about each of these new proposals and collated them on this post, with a few more observations and questions → Subnet Management - General Discussion - Governance / NNS proposal discussions - Internet Computer Developer Forum (dfinity.org)

On a side note...

@dsharifi, would you consider providing a link to the relevant forum discussion in proposal summaries in the future? This is super useful as it allows voters in the NNS dapp to know where to turn for critical discussion and debate relating to the proposals (allowing them to cast a more considered vote).

As an example, ‘Change Subnet Membership’ proposals have recently started providing a link in the proposal summary to the relevant subnet thread on this forum (thanks again @Sat), e.g. 132225, 132228. :)<

Update I’ve revisited via the dashboard and noticed that the URL field is provided and links to this topic (I was initially looking for this in the proposal summary). It doesn’t look like the NNS dapp renders this URL when viewed on a mobile device (but this is arguably one of the most important pieces of information for voters who wish to make an informed decision).

dsharifi · September 2, 2024, 9:23am

Update:
NNS proposals to increase the block rate of the following subnets were executed a few hours ago. We now observe the following block rates:

bkfrj → 2.57 Blocks/s
fuqsr → 2.49 Blocks/s
ejbmu → 2.45 Blocks/s
csyj4 → 2.43 Blocks/s
6pbhf → 2.10 Blocks/s

linkme · September 4, 2024, 1:39am

not blackbox AI, but fully audited and codereviewed AI

TusharGuptaMm · September 5, 2024, 7:03pm

Any TAT bring v3 to other subnets?

Jesse · September 9, 2024, 2:41pm

where can I find the docs on how to make the call to the v3/.../call endpoint?
I attempted to implement the new endpoint by simply changing v2 to v3 and it results in a 404 not found error.

edit: I just noticed this post

looks like the v3 route isn’t ready just yet. @dsharifi do you have an estimate on when the new endpoint will be ready for production?

Topic		Replies	Views
Suggested measures to reduce latency and improve ICP scalability Developers	48	1257	November 4, 2024
Increasing Ingress Message Throughput General	6	331	October 27, 2024
Decreasing HTTP Outcall Latency and Cost Developers	16	740	July 17, 2024
Internet Computer Performance - Dec 1, 2021 Load testing Developers	16	3855	March 28, 2022
WebSocket support Developers	18	3731	June 20, 2023

Reducing End to End latencies on the Internet Computer

Related topics