Issues with Rosetta Node Syncing

Hello,

I’ve been running a Rosetta node for ICP, and for the past few days, I have been encountering several errors and warnings. Specifically, I am seeing the following log entries:

2024-12-17T04:06:43.746423Z ERROR rs/rosetta-api/icp/src/rosetta_server.rs:468: Error in syncing blocks: InternalError(false, Details { error_message: Some("Tip of the chain has index 17454839 but no block found at that index!"), extra_fields: {} })
Failed to initialize ledger client: InternalError(false, Details { error_message: Some("In block: An error happened during communication with the replica: error sending request for url (https://ic0.app/api/v2/canister/ryjl3-tyaaa-aaaaa-aaaba-cai/query)"), extra_fields: {} })

In the rare case I manage to get the syncing complete, I am able to call queries. However, after a while, I can no longer do so, and the only way to restore functionality is by restarting the container. I’ve tried rebuilding the setup from scratch multiple times, but the issue persists.

Interestingly, the ICRC Rosetta node works fine without any issues!

Has anyone else experienced similar issues, or does anyone have any advice on how to resolve this?

Thanks in advance!

Does the rosetta node have a maximum call limit per second, minute, or hour?

If yes is it possible that the one for ICP ledger is lower than that for ICRCs?

Hi @champagnepapi,

Could you share which version of Rosetta you are using, and how you’re running it? If in a container, is it running under specific resource limits?

What are the requests you’re sending to it and how often?

Does the rosetta node have a maximum call limit per second, minute, or hour?

I don’t think there should be any differences there. There are differences on how ICP and ICRC Rosetta fetch blocks, but I am not sure this should result in different behaviour here.

However, I do see that there is one replica behind on the NNS subnet which may be causing the first error you are seeing: Rosetta first fetches the tip of the chain, verifies the certificate and then fetches the blocks up to the tip using (non-replicated) queries and verifies the chaining of blocks up to the tip. If it hits the slow node, then this may not have all the blocks yet.

How frequently are you seeing these errors? I am assuming that the boundary nodes would do round robin between the replicas when directing queries, so syncing may slowdown a bit. Does your instance take longer to sync or never catches up?

My rosetta nodes are mounted on docker container (with the latest image version from dfinity) and launched with the following commands:

ICP:

docker run \
    --interactive \
    --tty \
    --publish 8081:8081 \
    --detach \
    --restart always \
    --name rosetta-node-icp \
    --volume rosetta-data-icp:/rosetta/data \
    dfinity/rosetta-api \
    --mainnet

CHAT:

docker run \
    --interactive \
    --tty \
    --detach \
    --restart always \
    --name rosetta-node-chat \
    --publish 8082:8082 \
    --volume rosetta-data-chat:/rosetta/data \
    dfinity/ic-icrc-rosetta-api \
    --network-type mainnet \
    --ledger-id 2ouva-viaaa-aaaaq-aaamq-cai \
    --port 8082

Currently, both nodes are active and synchronized with the latest block. However, I am encountering another issue that I believe might be related to my main question:

I have two similar Python scripts for fetching transactions from the two nodes (one for ICP and one for CHAT). The script performs calls for ascending blocks in batches of ten, with a limit for each execution, and it runs every minute to save the transactions in a MySQL database. The problem is that while fetching from the CHAT node works without issues, for the ICP node, after a few hundred thousand blocks have been saved, the node stops responding. However, if I restart the container (or my server), the node responds again, and I can resume running my script.

This is why I was asking if there are specific issues with the ICP node (given the errors I initially received) or if there are any call limits.

Thank you for your help.

Thanks for your responses so far. Could you clarify which calls you’re making to ICP rosetta and what errors you’re seeing both in the rosetta logs as well as your server logs? We haven’t seen this error before so we want to make sure we have enough data to try reproducing it in case this is some kind of corner case.

The calls I’m making are to the following endpoints:

/network/status
/block
/account/balance

When the node stops responding, no message is received at all; it’s simply a blank response, as if it’s stuck in an infinite wait! In these cases, I don’t see anything unusual in the Rosetta logs or in the various server logs either.

Since I’m fetching the transaction history — in my case, from the volume — I was thinking that there might be bottlenecks in the reading and writing processes, which could be causing this issue. Also, as I’ve already mentioned, when I reboot the server, everything starts working again as before.

If that’s the case, once my database catches up with the blockchain in real time, this problem should disappear, right? What do you think?

Something else, I noticed when I reboot my server to let it work again, the node takes a huge amount of time to sync again (sometimes), receiving the following errors:

2024-12-28T14:51:24.209630Z INFO rs/rosetta-api/icp/src/main.rs:145: Starting ic-rosetta-api, pkg_version: 2.1.1

2024-12-28T14:51:24.209667Z INFO rs/rosetta-api/icp/src/main.rs:151: Listening on 0.0.0.0:8081

2024-12-28T14:51:24.209694Z INFO rs/rosetta-api/icp/src/main.rs:154: Internet Computer URL set to https://ic0.app/

2024-12-28T14:51:24.209754Z INFO rs/rosetta-api/icp/src/main.rs:211: Token symbol set to ICP

thread 'main' panicked at rs/rosetta-api/icp/src/main.rs:263:35:

Failed to initialize ledger client: InternalError(false, Details { error_message: Some("In block: An error happened during communication with the replica: error sending request for url (https://ic0.app/api/v2/canister/ryjl3-tyaaa-aaaaa-aaaba-cai/query)"), extra_fields: {} })

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

2024-12-28T14:52:21.206471Z INFO rs/rosetta-api/icp/src/main.rs:145: Starting ic-rosetta-api, pkg_version: 2.1.1

2024-12-28T14:52:21.206532Z INFO rs/rosetta-api/icp/src/main.rs:151: Listening on 0.0.0.0:8081

2024-12-28T14:52:21.206568Z INFO rs/rosetta-api/icp/src/main.rs:154: Internet Computer URL set to https://ic0.app/

2024-12-28T14:52:21.206732Z INFO rs/rosetta-api/icp/src/main.rs:211: Token symbol set to ICP

thread 'main' panicked at rs/rosetta-api/icp/src/main.rs:263:35:

Failed to initialize ledger client: InternalError(false, Details { error_message: Some("In block: An error happened during communication with the replica: error sending request for url (https://ic0.app/api/v2/canister/ryjl3-tyaaa-aaaaa-aaaba-cai/query)"), extra_fields: {} })

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

2024-12-28T14:53:17.981460Z INFO rs/rosetta-api/icp/src/main.rs:145: Starting ic-rosetta-api, pkg_version: 2.1.1

2024-12-28T14:53:17.981503Z INFO rs/rosetta-api/icp/src/main.rs:151: Listening on 0.0.0.0:8081

2024-12-28T14:53:17.981568Z INFO rs/rosetta-api/icp/src/main.rs:154: Internet Computer URL set to https://ic0.app/

2024-12-28T14:53:17.981621Z INFO rs/rosetta-api/icp/src/main.rs:211: Token symbol set to ICP

thread 'main' panicked at rs/rosetta-api/icp/src/main.rs:263:35:

Failed to initialize ledger client: InternalError(false, Details { error_message: Some("In block: An error happened during communication with the replica: error sending request for url (https://ic0.app/api/v2/canister/ryjl3-tyaaa-aaaaa-aaaba-cai/query)"), extra_fields: {} })

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

2024-12-28T14:54:13.930862Z INFO rs/rosetta-api/icp/src/main.rs:145: Starting ic-rosetta-api, pkg_version: 2.1.1

2024-12-28T14:54:13.930893Z INFO rs/rosetta-api/icp/src/main.rs:151: Listening on 0.0.0.0:8081

2024-12-28T14:54:13.930918Z INFO rs/rosetta-api/icp/src/main.rs:154: Internet Computer URL set to https://ic0.app/

2024-12-28T14:54:13.930954Z INFO rs/rosetta-api/icp/src/main.rs:211: Token symbol set to ICP

thread 'main' panicked at rs/rosetta-api/icp/src/main.rs:263:35:

Failed to initialize ledger client: InternalError(false, Details { error_message: Some("In block: An error happened during communication with the replica: error sending request for url (https://ic0.app/api/v2/canister/ryjl3-tyaaa-aaaaa-aaaba-cai/query)"), extra_fields: {} })

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

2024-12-28T14:55:10.063913Z INFO rs/rosetta-api/icp/src/main.rs:145: Starting ic-rosetta-api, pkg_version: 2.1.1

2024-12-28T14:55:10.063963Z INFO rs/rosetta-api/icp/src/main.rs:151: Listening on 0.0.0.0:8081

2024-12-28T14:55:10.064002Z INFO rs/rosetta-api/icp/src/main.rs:154: Internet Computer URL set to https://ic0.app/

2024-12-28T14:55:10.064062Z INFO rs/rosetta-api/icp/src/main.rs:211: Token symbol set to ICP

Meanwhile the ICRC node is working smooth! I need help …

It took about 10-15 minutes to sync again… with a volume!

While the ICRC (CHAT) node takes just a few seconds… I know it has a much smaller ledger, but I believe the ICP not syncing properly is not normal… plus the issue of unresponsive calls that happen from time to time once it’s synced.

That’s why I was asking if there are any call limits or perhaps an overload …

Hi @champagnepapi,

Thank you for you patience, I’m back looking into this now.
Could you confirm:

  • Which Rosetta version are you using.
  • What QPS is being sent to the server when it stops responding.

ICP initial synchronization can in fact take a long time to finish (over an hour depending on the machine specs). If your restarts are also wiping out the already synchronized database, you should expect those delays.