Error encountered in HTTPs Outcalls

I deployed a canister on the IC that implements HTTP Outcalls. The method appeared to function correctly for several days, but today it returned an error when I executed it.
opt "Rejection code 2, Canister http responses were different across replicas, and no consensus was reached";

Additionally, I encountered an error on the local replica today, despite all tests passing previously.
"Rejection code 1, Http body exceeds size limit of 785 bytes.";
I also tried to increase the max_response_bytes by 2000 but an error remains the same "Rejection code 1, Http body exceeds size limit of 2785 bytes.";

2 Likes

For the first error, it sounds like, as written, the response is not deterministic or not correctly transformed. The outcall is going out from multiple node machines. Each one then applies the transform function on the response it gets. The transformed response must be exactly the same on all nodes. So you must make sure it is the case.

For the other error, it sounds like the responses are also too big. Do you know what size you actually expect there?

The response size is less than 1,000 bytes.

My main concern is that I have not made any changes to my code, and it is working correctly on both networks. All tests are also passing. However, today I received these errors:

Could you try to provide more information?
With the existing information, it is very hard to tell what is going onā€¦

For the second error, perhaps you didnā€™t send enough cycles?

Iā€™m experiencing the same issue with version 0.17.1, but with the message ā€˜ā€œRejection code 1, Http body exceeds size limit of 3094 bytes.ā€ā€™ Iā€™ve attempted to add more loops and increase the ā€˜max_response_bytesā€™, but the error persists. The file where this error occurs can be found here ā€” https://github.com/relinkd/ICP-Wallet-Scorer/blob/main/src/params/poap.ts. Are there any limitations in Azle regarding the maximum response size? Does version 18 differ in this regard?

There shouldnā€™t be any in Azle, and I donā€™t think it should matter from 0.17.x to 0.18.x

Rejection code 1 is a SysFatal error, thereā€™s something wrong at the ā€œsystemā€ level it seems. Iā€™m not sure what limit youā€™re hitting, @ulan do you know who would know the low level details of http outcalls?

I am making HTTP requests to the API. The response size is 139 bytes, and Iā€™m sending 100M cycles for this call. It appears that this API is functioning correctly on both networks. However, a few days ago, it encountered an error.

It has come to my attention that when I include a User-Agent header, the method works properly (although I havenā€™t tested this on the mainnet yet, only locally). However, Iā€™m uncertain whether adding this header is necessary, as it was functioning correctly without it a few days ago.

This error persists on Mainnet
@yotam @ulan

I was able to reproduce the issue locally and on small testnet. It seems that without the User-Agent header the response has status 403 (forbidden) and a non-deterministic body. When I set the User-Agent header to, for example, my-agent/1.0 it seems to work.

Do I understand correctly that in your case the issue persisted on mainnet when setting a valid User-Agent header?

If yes can you give me the following details such that I can try to reproduce on mainnet:

  • Subnet your canister is deployed to.
  • dfx version used when deploying.
  • Specific User-Agent used to do the requests.

When I add the User-Agent Header the HTTP outcalls work locally, but on IC it fails
So for Mainnet, I removed all headers while making a GET request.

However, Iā€™m encountering an issue where the result does not consistently persist with each call.

  1. Sometimes it works fine
  2. Sometimes it results "Rejection code 2, Canister http request timed out"
  3. Sometimes it results "Rejection code 2, Canister http responses were different across replicas, and no consensus was reached"

I think issue 3 is caused by the API returning a 429 error when too many requests are made, and in this case, the response is non-deterministic. You can see in the script below that the hash differs for status 429. To gracefully handle this you could prune the body in the transform function for status code 429.

for i in {1..100}; do curl -s -o response.txt -w "%{http_code}" https://deces.matchid.io/deces/api/v1/version; echo -n " "; sha256sum response.txt | awk '{print $1}'; done

Issue 2 seems to be a connectivity issue between the IC nodes and matchid.io. How often does this occur compared to issues 1 and 3? Both kinds of errors are common for public APIs that have stricter rate limits, so I would suggest ensuring that your calling logic limits in-flight requests and implements a backoff strategy for failed requests.

3 Likes

@luckerninja Does any of this solve your problems?