Canister return 504, too much traffic on subnet?

ccyanxyz · August 25, 2021, 3:23pm

Hi, this is the developer from DFinance, we launched our testnet today, it all went smoothly until a while ago, users talking about loading too slow, I tried to call our backend canister with dfx, but I got 504 error, is this because of too much traffic? I thought IC has load balance?

Canister id: Principal lf23w-ciaaa-aaaah-qaeya-cai | ic.rocks
We are in this subnet: Subnet pjljw-kztyl-46ud4-ofrj6-nzkhm-3n4nt-wi3jt-ypmav-ijqkt-gjf66-uae | ic.rocks

ccyanxyz · August 25, 2021, 3:28pm

cryptoschindler · August 25, 2021, 4:41pm

I believe this is a problem for your subnet, other projects reported the same issues, maybe you’re all on the same subnet? (entrepot & icpunks)

https://dashboard.internetcomputer.org/subnet/pjljw-kztyl-46ud4-ofrj6-nzkhm-3n4nt-wi3jt-ypmav-ijqkt-gjf66-uae

ccyanxyz · August 25, 2021, 4:44pm

Yes I believe so, we are in the same subnet.

cryptoschindler · August 25, 2021, 4:47pm

gg, you broke the IC!

ccyanxyz · August 25, 2021, 4:49pm

Haha, stress test, find the problem early so we can grow stronger.
It seems that the current subnets architecture can not handle too much triffic.

cryptoschindler · August 25, 2021, 4:51pm

Luckily the IC can scale and add nodes to the subnets !

stopak · August 25, 2021, 4:52pm

We’ve also launched a whitelist event which involved mutiple people to login to our website icpunks.com and had exact problem. It looks like that we did a stress test

ccyanxyz · August 25, 2021, 4:53pm

About 9k user participated in our testnet event, at the first several hours, it’s working smoothly, then it stucked.

stopak · August 25, 2021, 4:55pm

So you are the culprit ;). I’m glad that it happend today and not on the claiming day :D.

free · August 25, 2021, 5:03pm

As far as we can tell this has to do with the fact that subnet pjljw handles by far the most queries, combined with an issue we discovered recently, where we only cache the compiled Wasm when executing updates, not queries. Meaning that before an update is executed on a canister, each query will compile the Wasm from scratch. Because of the load, some queries will be really slow, while other queries will time out before they even get a chance to execute.

The reason why this started suddenly is that we upgraded the replica version on subnet pjljw a few hours ago and and caches got purged. Why this behavior wasn’t noticed until now (on previous replica upgrades) is unclear.

We’re working on a fix, but it may take a while to deploy. In the meantime, the more canisters on pjljw handle at least one update, the less contention among queries, so this should become better over time.

stopak · August 25, 2021, 5:09pm

Thanks for the fast response. Is there anything that we can do right now, or just wait?

free · August 25, 2021, 5:12pm

Make sure to run an update query on each of your canisters. (o:

More seriously though, I don’t think there’s anything you can do. I’ll try to figure out if I can find a way to run a replicated query (i.e. run a query via an ingress message) on all canisters on the subnet, to prime the cache. Actually, now that I think of it, anyone could do it.

alexander · August 25, 2021, 5:33pm

We have same problem accessing our application which runs in the same subnet.

nandit123 · August 26, 2021, 12:19am

IC Drive was also affected, the app was loading super slow and our public file links were also affected. Seems it’s fine now.

peterparker · August 26, 2021, 4:46am

The issue seems to still happen, I face the net::ERR_ABORTED 504 with my asset canister too right now.

ccyanxyz · August 26, 2021, 4:47am

Yes, it happens again

Shuo · August 26, 2021, 5:30am

We noticed the issue happened again. The root cause has been identified and the fix is on the way. Apologies for the inconvenience and thanks for your patience.

stopak · August 26, 2021, 7:42pm

Hi Shuo, do you know when it will be fixed? It’s getting hard to test anything

diegop · August 27, 2021, 3:20am

Thanks for asking, I’ve pinged the team to see who can give an update. Sorry I can’t help, I’m not familiar with the status.

Topic		Replies	Views
High User Traffic Incident Retrospective - Thursday September 2, 2021 Developers	50	8954	October 30, 2021
Issue on subnet jtdsg-3h6gi-hs7o5-z2soi-43w3z-soyl3-ajnp3-ekni5-sw553-5kw67-nqe General	4	643	April 14, 2022
Frontend canister deployment failing for ic mainnet, timing out Developers	15	225	October 11, 2024
Subnet Malfunctioning Developers	19	509	January 16, 2025
Suggested measures to reduce latency and improve ICP scalability Developers	48	1269	November 4, 2024

Canister return 504, too much traffic on subnet?

Related topics