Hi @infu
Sorry for taking so long, but now I finally want to provide more details on what went wrong:
What happened?
As you reported on August 8th, SEO was broken for many sites hosted on the Internet Computer. When looking at our logs, we realized that it was actually broken since end of July, which coincided with a boundary node release. So, we thought we found “the culprit”, but unfortunately, it was not that simple: it was a combination of a regression in the asset canister and a security fix on the boundary node.
Asset Canister Regression
The asset canister certifies all the assets it stores and serves a certificate alongside these assets in order to proof their authenticity. The service worker and icx-proxy (the service worker equivalent residing on the boundary node for SEO requests and raw) verify the certificate and reject any responses that don’t check out.
Response verification version 1 (the vast majority of asset canisters use it today) does not allow to certify multiple encodings (e.g., gzip, brotli, identity) at the same time, but only a single one. As a workaround, the asset canister provides certain assets gzipped, but serves the certificate for the identity encoding. Both the service worker and icx-proxy always check the certificate on the asset directly and on the unzipped version. If the certificate checks out for one of the two, the response is accepted.
After all this background, I can finally get to the problem: As part of adding support for response verification v2, a regression was introduced: the asset canister would only return a certificate if the client would include the certified encoding (identity
) as part of the “Accept-Encoding” header in the HTTP request.
For the service worker, this was no issue, as it always requests: “Accept-Encoding: gzip, deflate, identity”. Since, the service worker asks for identity
, the asset canister always returned the certificate.
However, most of the crawlers only use “Accept-Encoding: gzip, deflate, br”. Since identity
is missing, the asset canister didn’t return a certificate and the response does not pass the verification step.
This regression was introduced with dfx 0.14.0
. However, it was “dormant” due to icx-proxy on the boundary nodes not enforcing the certificate verification. That leads me to the other piece in the puzzle.
Boundary Node Security “Fix”
Due to historical reasons, icx-proxy does not enforce certification for most requests. This means that if a response does not come with a certificate, it is just passed on to the client. However, if there is a certificate, it is verified and the response is rejected if it doesn’t check out.
In order to increase the security, we aim to enforce certification on all endpoints, but can’t actually do that as many developers explicitly use raw to circumvent certification. As we are providing more and more libraries to help with certification, this might (and hopefully does) change.
As a security fix, we started to enforce certification only on specific endpoints, which included the SEO requests. This triggered the asset canister regression to surface as responses from the asset canister to requests coming from crawlers and SEO bots, did not include the corresponding certificate and would therefore be rejected by icx-proxy, resulting in a 500 status code for the clients.
What we learned and fixed
As a first quick fix, we immediately reverted the security “fix” on the boundary nodes and disabled enforcing certification for SEO requests.
In the meantime, the asset canister has been fixed and will be released with dfx 0.15.0
. In addition, we have been working improving our end-to-end testing of the service worker/icx-proxy and the asset canister. Finally, we are setting up better monitoring for boundary node rollouts.
Once the new, fixed asset canister is in wide use, we will think about enforcing certification again for SEO requests, but keep a close eye on the metrics.
I hope my explanation shed some light on what happened and I just want to thank you, @infu, again for reporting your observations that really helped getting to the bottom of this. If you have any questions or something I explained is not clear, just let me know.