Evm rpc httpOutcallError No consensus could be reached

malteish · July 14, 2024, 9:37am

Hi everyone,

I’m developing ReTransICP and experiencing some trouble with the rpc canister:

[30467. 2024-07-09T09:55:30.780289441Z]: Timer has finished, running job 4 now.
[30468. 2024-07-09T09:55:40.790182797Z]: [TRAP]: Error: HttpOutcallError(IcError { code: SysTransient, message: "No consensus could be reached. Replicas had different responses. Details: request_id: 2881434, timeout: 1720519234770006900, hashes: [5087cf729dad3a200cda8acdfbe7e07a6822d082a82ec22ae9868a01704a5ce3: 14], [edd83ffbf69636a42b1698bcb4cf8e77c1b3fc170d4be70ab8f848019ef85610: 12]" })
[30469. 2024-07-09T10:10:50.751649035Z]: Failed to get the latest finalized block number: HttpOutcallError(IcError { code: SysTransient, message: "No consensus could be reached. Replicas had different responses. Details: request_id: 2881916, timeout: 1720520144606440478, hashes: [7fc6ca4c17c49614128f8a4bc1e5c45a0672a796d8e2da00de41f1d3ab432df9: 17], [98111562d863c9a00a2a62738f46b694c4c3e9f4a9de546d5c008287b09f43b8: 10]" })
[30470. 2024-07-09T10:41:53.430189118Z]: Failed to get the latest finalized block number: HttpOutcallError(IcError { code: SysTransient, message: "Canister http request timed out" })

Currently struggling to understand what causes this error and where it originates so I can catch and handle it. Any hints on that?

rvanasa · July 15, 2024, 4:06pm

No consensus could be reached. Replicas had different responses.

This error message occurs when calling the same RPC endpoint from different subnet nodes return different results. The most common reason for eth_getBlockNumber is that the block number changes partway through the HTTP request. It’s also possible to encounter this error due to rate limits during high canister traffic.

Canister http request timed out

This also seems like a temporary error as a result of high canister traffic on the fiduciary subnet.

In both of these cases, it’s sometimes necessary to wait for a few minutes and then retry the request.

kristofer · August 20, 2024, 2:26pm

I am also seeing this. “Waiting a few minutes” does not seem to change anything, the issue is quite persistent. I am using one rpc service only, on Arbitrum.

Ok((Consistent(Err(HttpOutcallError(IcError { code: SysTransient, message: "No consensus could be reached. Replicas had different responses. Details: request_id: 4020601, timeout: 1724163283271565462, hashes: [55b8b966a9d5d1cfd98d19d8ddceac5a9a6b5276d5fd5fa966a04fdb6e264bf1: 16], [6866bf29556e2638aa0c51ffa31c8811e4ec68a5340332fba34b4cf347ed9030: 5], [c45cdc7563848760da98df0f5ce3184cf527839eda5ac96d40bd5204602b3229: 4], [bdf01813346c43a5a39dbf2027930946372a1339c0fe688b4e6d33d7cd4cd7e6: 1], [62e48a5a1ffb7273ddba7644a091d013610c3998b5898efc88139bc4967c0075: 1]" }))),))

A consistent response saying no consensus was reached

Shouldn’t an inconsistent response be returned instead? So app can proceed if it is fine with the inconsistency.

Most other eth functions you can tie to a block number to decrease this risk. But to establish a “starting point” to be used in coming calls you need to be able to get the latest block (or there around) reliably.

rvanasa · August 20, 2024, 3:33pm

Which RPC service and method(s) are you using? I’ll see if we can fix this if it’s possible for us to repro the issue.

The reason we (currently) can’t return an inconsistent error message here is that HTTPS outcall consensus happens at the protocol level of the Internet Computer. Essentially, each of the 28 fiduciary subnet nodes makes the HTTP request, and the IC compares all of the results. After post-processing the response, we receive either a consistent response or an error code indicating inconsistent results. It may be possible to further customize this behaviour in the future given enough community interest.

A consistent response saying no consensus was reached

This indicates that the results are consistent with each other between RPC providers (which is always consistent when calling just one provider). I see what you’re saying though!

kristofer · August 20, 2024, 5:35pm

Thanks! I had this issue today with the following configs:

Arbitrum: BlockPi
Base: Ankr/BlockPi

Method used was eth_getBlockByNumber. Code is mostly copied from the EVM_RPC examples app.

github.com/c-atts/catts-app

packages/catts_engine/src/evm/rpc.rs

0295f6a9a


      
              Ic(String),
              #[error("Inconsistent responses from multiple RPC services")]
              Inconsistent,
              #[error("Unexpected error: {0}")]
              Unexpected(String),
          }
          
          /// Returns the latest block with one caveat: if multiple Rpc services are specified and
          /// they return inconsistent responses, the function will return the earliest block in the
          /// - the "smallest common" latest block.
          pub async fn eth_get_block_by_number(
              block_tag: BlockTag,
              chain_config: &ChainConfig,
          ) -> Result<Block, EvmRpcError> {
              let call_result: CallResult<(MultiGetBlockByNumberResult,)> = call_with_payment128(
                  evm_rpc.0,
                  "eth_getBlockByNumber",
                  (
                      chain_config.rpc_services.clone(),
                      None::<RpcConfig>,
                      block_tag,

The RPC responses are consistent but the nodes did not reach consensus. That would mean some of the nodes did not get any responses, right? So, a timeout issue or overload issue?

Canister is live btw, error can be reproduced by logging in and running a recipe, this one for instance: C-ATTS, Composite Attestations

rvanasa · August 20, 2024, 11:35pm

This is usually caused by timeouts or rate limits on the individual JSON-RPC services. It’s also possible that a provider is returning a nonce or random number in each response, which can break the HTTPS outcall consensus.

In this case, it could be a temporary issue with the provider’s JSON-RPC API; otherwise, we might be able to to adjust the EVM RPC canister to ignore or patch the problematic data. For eth_getBlockByNumber, we’ve addressed this by filtering unknown response fields, but during high traffic, the latest block returned for L2 chains might change quickly enough that the third-party API responds with a different block between the start and end of the HTTP outcalls.

Very cool project by the way! I was able to create a custom “Eth Zero Balance” recipe to try it out (since my wallet didn’t have any Eth) and managed to recreate the error. This will be useful for narrowing down and hopefully resolving the issue.

kristofer · August 21, 2024, 6:51am

Thanks for looking into this, it is the last blocker preventing me from launching.

I am … almost … certain though that when I first saw this error it happened a bit further down in the flow of the run_create canister function. Here, when I run the queries in the recipe to estimate gas usage for creating the attestation:

github.com

c-atts/catts-app/blob/f0ff0789de4c7c7614126de397254593bf5ac793/packages/catts_engine/src/eas.rs#L203


      
                  method: HttpMethod::POST,
                  headers: vec![],
                  body: Some(body),
                  max_response_bytes: None,
                  transform: Some(TransformContext::from_name(
                      "transform".to_string(),
                      serde_json::to_vec(&Vec::<u8>::new()).unwrap(),
                  )),
              };
          
              match http_request(request, ETH_DEFAULT_CALL_CYCLES).await {
                  Ok((response,)) => {
                      Ok(String::from_utf8(response.body)
                          .expect("Transformed response is not UTF-8 encoded."))
                  }
                  Err((r, m)) => Err(RunEasQueryError::HttpRequestError {
                      rejection_code: r,
                      message: m,
                  }),
              }
          }

These outcalls run through a caching proxy that ensures only one request is made to whichever API is being queried. It also ensures all the responses the nodes get are the same.

kristofer · August 21, 2024, 7:47am

ps. I do some in canister logging that can be accessed in the logs() function here:

https://a4gq6-oaaaa-aaaab-qaa4q-cai.raw.icp0.io/?id=7w3i7-3iaaa-aaaal-qi2iq-cai

… and with the help of those can confirm that the issue happens for regular http_request outcalls also.

rvanasa · August 21, 2024, 11:08pm

Thanks for the info! I tried running “Zero Eth Balance” several more times and got successful EVM RPC eth_getBlockByNumber responses in the logs. I’m seeing “Error creating run” on the webapp with a 500 status code in the dev console. It’s unclear what’s specifically causing this error. Is there an additional call to the EVM RPC call which is perhaps crashing before writing the log message, or do you think this is an unrelated issue?

rvanasa · August 22, 2024, 10:29pm

As a quick follow-up question, do you encounter the “No consensus could be reached” error message primarily while getting the latest block, or does it happen in other situations as well?

If the issue is specifically with the latest block, perhaps it could be worth using a conventional HTTP request to get the latest block and then calling eth_getBlockByNumber using EVM RPC with the known block number from the response.

kristofer · August 23, 2024, 12:36pm

So, I had an issue with the recipe query proxy as well that most likely caused the error you saw. That is fixed now.

I had some consensus error earlier today but now the EVM RPC calls seems to go through. So I guess it was some load /quota issue before? I am running only one RPC provider per chain now to decrease the risk of issues. But the whole setup feels very shaky, not sure I will be able to launch project with the current state of things.

I’ll create some attestations the coming days to see if it runs stably. If not, I will have to rethink architecture and perhaps go back to just doing HTTPS outcalls for all EVM interactions.

Thanks for your help!

kristofer · August 23, 2024, 10:36pm

Hrm. EVM RPC calls fail again, now using one provider only. This time on Optimism/BlockPi.

malteish · August 24, 2024, 9:44am

Hi @kristofer, hi @rvanasa,

thank you so much for your contributions. I now know I am not the only one seeing this issue, which isn’t a solution, but still helpful.
Some background: my application is connecting to the Gnosis chain. I used various public rpcs from chainlist, all with similar results. In my case the problems do not seem to be persistent though: The container runs for some time, polling the latest block and scanning for events, until at some point it fails. If I restart it, it works again for several hours.

After not having had the time to tinker with this for a while, I plan to look into it some more in the upcoming week. I look forward to sharing more information as it comes in.

@kristofer, have you tried to catch this error and repeat the request? If so, can you share some code? I know it might not help in your case, but it might in mine, and I am struggling with rust, which is still new to me.

kristofer · August 25, 2024, 7:26am

You could definitely create a queue for your EVM calls and retry them if they fail. Here is an example of catching the error, even though using ic_cdk::trap is not the most graceful error handling.

github.com

c-atts/catts-app/blob/15a7d7140b248b32197b22457277ad2d9a2e690e/packages/catts_engine/src/evm/rpc.rs#L54


      
                      chain_config.rpc_services.clone(),
                      None::<RpcConfig>,
                      GetTransactionCountArgs {
                          address: get_self_eth_address().await,
                          block: BlockTag::Latest,
                      },
                  ),
                  ETH_DEFAULT_CALL_CYCLES,
              )
              .await;
              match res {
                  Ok((MultiGetTransactionCountResult::Consistent(GetTransactionCountResult::Ok(id)),)) => id,
                  Ok((inconsistent,)) => ic_cdk::trap(&format!("Inconsistent: {inconsistent:?}")),
                  Err(err) => ic_cdk::trap(&format!("{:?}", err)),
              }
          }
          
          #[derive(Error, Debug)]
          pub enum EthTransactionError {
              #[error("Unable to encode args")]
              ArgsEncoding,

eth_getBlockByNumber will always be problematic with the current way of doing things. Since IC sends many calls to the RPC, the likelyhood is quite large that the latest block changes during the course of processing all those calls. And then you get an inconsistent response.

I would suggest you reconsider your architecture. I will. Does the canister really need to fetch the latest block number or is that a value that can be submitted by the frontend that initiates the call. That is probably how I will solve things. Frontend gets latests block number, makes gas estimations etc by calling the Alchemy API directly. Then those values are submitted along with the canister call that does the actual “write”, in my case that creates an EAS attestation. A malicious client could of course submit any cooked up values but the only thing that would lead to is the EAS transaction failing.

Another alternative would be to, instead of using the EVM RPC canister, use direct HTTPS outcalls from the canister to an EVM RPC API provider and route those calls through a proxy that caches identical calls and makes sure they get the same response. Running such a proxy on Cloudflare for instance is straight forward, here is an example that I believe is mostly good to just run:

Topic		Replies	Views
HTTPS outcall consensus errors when calling alchemy api Developers	6	192	July 20, 2024
Httpoutcall error Developers	2	182	June 7, 2024
SysTransient error from https outcall Rust	1	32	January 27, 2025
HTTP POST outcalls consensus not working , only one is working Rust	12	581	August 4, 2023
HTTPS outcalls no response Developers	5	207	July 5, 2024

Evm rpc httpOutcallError No consensus could be reached

Related topics