504 (Gateway Time-out) while backfilling data

icme · March 11, 2023, 9:33am

I was backfilling data (several hundred MB) to the IC, running two parallel processes backfilling data to two different canisters on this subnet. Between 9-9:30am UTC and received these errors

I don’t know exactly when the first error occurred as I was tabbed away at that time, but when I came back to my script I saw the following errors:

On process one, the backfill had halted with this error

Error: Server returned an error:
  Code: 504 (Gateway Time-out)
  Body: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.21.3</center>
</body>
</html>

Shortly after, on process two, the backfill had halted with this error

Error: Request timed out after 300000 msec:
  Request ID: e708d5d5fd8d5c7767d75ee50547f3f1039406268229dc1cfb3556847ad0b222
  Request status: processing

I will definitely account for retries next time , but am wondering why this may have occurred, and what is the frequency of these 500-like errors.

raymondk · March 13, 2023, 5:19pm

Hi @icme,
Can you give some detail on the process? What is it written in? How are you running it? Which agent are you using? Does it make a large request or many smaller requests?

The second snippet you have there looks like it’s coming from an agent. Maybe timeout after waiting for an update to complete.

From the boundary node metrics, it looks like a 504 coming from one specific node:

icme · March 13, 2023, 7:47pm

Hi @raymondk, thanks for digging and finding the chart!

Motoko

Locally, with a node script and and old version of the agent - I think 0.11.1

With each API call, I was batch sending 500 metadata records at a time (each record not large at all, you can see the size of each data record by performing a query here), and then within a single call inserting all 500 records into my “db”.

As an estimate, I’d say each batch update call takes anywhere between 6-20 seconds to complete.

One thing to note is that when inserting/updating data, under the hood the CanDB client performs a combination of query and update calls in order to target the correct canister with the metadata being inserted, so if it was a single node causing issues it’s definitely possible that a single node could have caused the update to fail. An interesting thing to note however is that these 2 different errors appeared on two different canisters, so my location must have been querying the same boundary node for both requests.

raymondk · March 13, 2023, 8:37pm

Apologies, I wasn’t clear. The graph shows a single “replica node” (not a boundary node) returning 504s - the error is just being bubbled up by the boundary node.

If you were making multiple calls within the same subnet, it’s likely you were getting errors from the same replica node even if the calls were to different canisters.

Topic		Replies	Views
504 errors are getting frequent these days Developers	3	595	June 26, 2023
Canister return 504, too much traffic on subnet? Developers	29	2092	August 27, 2021
Failed to load resource: the server responded with a status of 504 (Gateway Timeout) Developers	2	2198	December 19, 2024
How to handle ETIMEDOUT errors? JavaScript	2	5442	July 29, 2022
ConnectionRefused on localhost! Frontend unable to connect to my backend Programs & Applications	4	12281	March 1, 2022

504 (Gateway Time-out) while backfilling data

Related topics