Subnets with heavy compute load: what can you do now & next steps

xalkan · October 30, 2024, 3:57pm

It was working fine, but now this error has returned. Is yinp6-35cfo-wgcd2-oc4ty-2kqpf-t4dul-rfk33-fsq3r-mfmua-m2ngh-jqe not upgraded yet?

xalkan · October 30, 2024, 6:55pm

Still experiencing the ingress expiry issue with yinp6… Will the new replica version fix it?

free · October 30, 2024, 7:52pm

That error message has nothing to do with ingress expiry due to subnet load (which would happen 5 minutes after you submitted it, not immediately). It simply says that you submitted an ingress message with an expiration time in the past. The wall time on the replica that you sent the ingress message to was 15:53:27 UTC. It was willing to accept ingress messages with an ingress_expiry value set to anything between 15:53:27 UTC and 15:58:57 UTC. Whereas the ingress message you submitted had ingress_expiry equal to 15:53:00 UTC. I.e. it was already expired because you said so. You essentially told the replica “do this for me and give me the response 30 seconds ago, at the latest”.

xalkan · October 30, 2024, 8:13pm

What would you recommend (using the js agent)? Ingress expiry issue

YAP · October 31, 2024, 6:32am

Hi Xalkan
@free is correct, the expiry set in the request you sent to the replica was too old.
What tools were you using which exhibited this behavior? dfx? browser with candid? Maybe it’s due to an old agent, we recently fixed some bugs related to this. If you’re using agent-js check if it’s 2.1.3

xalkan · October 31, 2024, 11:39am

Hi Yvonne,

It’s a Next.js app using Juno’s core-peer (about to update it to its latest version, which includes agent-js v2.1.3 ).

YAP · October 31, 2024, 12:08pm

Assuming your machine’s time is synced, then with the update to 2.1.3 the problem should go away. Let us know if it persists.

timo · November 7, 2024, 10:41am

Yes, understood. I meant “heartbeats and their subsequent update calls” are drowning out the number of ingress messages. Of course, as you say, ingress messages also make subsequent updates and they all appear together in one metric. Anyway, I would still iterate the desire to see the number of ingress messages of a subnet. I mean only the ingress messages themselves, not their subsequent update calls. That will provide valuable information and I can calculate everything else from there. I think it’s an absolute must-have in terms of transparency.

Manu · November 7, 2024, 2:03pm

That seems like a reasonable request @timo, i’ll pass on that feedback in dfinity.

Manu · November 7, 2024, 4:13pm

Update: this week’s replica versions again improve the situation a little bit, and we are now at a point where we see all subnets process updates in < 10 seconds and typically in < 5. With that, I will update statuspage to remove the “degraded performance”.

DFINITY will keep focusing on ensuring ICP can high load well and proceed with the steps outlined in this forum post.

baolongt · November 11, 2024, 8:14am

Can anyone from Dfinity update this page to clear how much cost for canister operations

skilesare · November 25, 2024, 3:50pm

Hey @Manu , @free, I know “it depends”, but given the current scheduler code, what is the max number of canisters that can get scheduled to process an incoming message in a round if the other canisters in the round are processing a very small number of cycles in their call? A range is fine, just trying to mitigate and estimate estimated through put in an “average” application subnet that gets slammed.

free · November 25, 2024, 3:56pm

Looking at the past 24 hours, we’ve apparently had a few hundred instances of between 500 and 1k canisters executed in a single round, but not more. Said canisters likely didn’t do much work.

skilesare · November 25, 2024, 9:22pm

Oh wow. I thought that due to having to load memory in that it was two ordes of magnitude below that. Is there some intelligent caching going on that keeps more active wasms and memory at hand?

free · November 26, 2024, 3:10pm

We do hold on to thousands of sandbox processes (that have the Wasm already loaded). And the previously mmap-ed file backed memory is likely cached by the OS.

Topic		Replies	Views
LAMENT: A tale of constant struggle of what it's like trying to scale on ICP Developers	73	3620	November 3, 2024
Suggested measures to reduce latency and improve ICP scalability Developers	48	1324	November 4, 2024
High User Traffic Incident Retrospective - Thursday September 2, 2021 Developers	50	8981	October 30, 2021
Fixing incorrect message memory fee Developers	25	2150	October 6, 2023
Path forward for subnet splitting and protocol scaling Developers	19	375	October 17, 2024

Subnets with heavy compute load: what can you do now & next steps

Related topics