Boundary node http response headers

GitHub issue: https://github.com/dfinity/certified-assets/issues/10

I’m getting near the end of implementing partial responses for asset canisters, which allows more advanced streaming of audio and video content. My solution is working well locally, and I’ve deployed the new asset canister to production.

The problem is that in production the partial responses no longer work. There are a few response headers that are key to getting this to work, Content-Range and Accept-Ranges. I wonder if the boundary nodes are filtering these out? Is there some boundary node whitelist that needs to be updated? And if so then why is the local IC replica not also implementing this whitelist?

I’d love to get the whitelist updated ASAP so I can continue testing my solution. And as soon as this issue is addressed anyone can start using the new asset canister even before my pull request is approved (if they want to test it out).

People I think can help: @yotam @diegop @ianblenke @jplevyak

3 Likes

The local replica does not route through the boundary node header filtering logic.
I’ll talk to the team and see about adding the Content-Range and Accept-Ranges headers.

2 Likes

Does the header filtering also filter out request headers?

These are the additional request headers I need:

  1. Range
  2. If-Range (I don’t have this implemented yet but it’s in the http spec)

These are the additional response headers I need:

  1. Accept-Ranges
  2. Content-Range

I also need to allow status codes 216 and 416, not sure if those get filtered out somehow as well.

1 Like

It might help if someone writewrites up an actual specification for the HTTP gateway feature, so that we don’t have to probe implementation defined behavior…

3 Likes

Can I get an update or ETA on when we could get these headers added? Also, seems the boundary nodes will be limiting a lot of functionality if we have to whitelist http headers

Why do you want range requests? Best practices for video and audio are to name the path by a hash of the content and codecs, break up the content into time ranges and compression level named chunks and use separate files for each chunk. Range requests are generally uncacheable and thus would requiring hitting the canister and would be very expensive and slow. See HTTP Live Streaming - Wikipedia for details.

2 Likes

Range requests seem to me to be fundamental to normally functioning audio and video on the web, otherwise the client players can’t stop and resume at various locations in the audio or video. The audio and video elements make heavy use of range requests.

My podcast Demergence is basically broken when people stream it because range requests aren’t implemented. And this is through players like Apple Podcasts as well. Range requests are well supported on many client platforms.

Notice how you can’t skip around any of the audio elements here: https://ic3o3-qiaaa-aaaae-qaaia-cai.ic0.app/

Notice how you can’t jump around this video: https://rdbii-uiaaa-aaaab-qadva-cai.raw.ic0.app/canvas-timelapse.mp4

Here’s the issue in the certified-assets repo: https://github.com/dfinity/certified-assets/issues/10

And this was kicked off by this tweet: https://twitter.com/dominic_w/status/1467144071449915395

I need this feature for my podcast Demergence, and anyone else who wants to host audio or video will need range requests to work as well.

3 Likes

It also looks like HLS has very bad support in browsers if I’m not mistaken: "hls" | Can I use... Support tables for HTML5, CSS3, etc

1 Like

Range requests should not be served from the canister. Instead we can implement the range responses from a fully cached object out of the boundary nodes. See Smart and Efficient Byte-Range Caching with NGINX

1 Like

Agree with this. Storing chunks would be optimal especially for serving it from a cache. Currently serving content from the Boundary Nodes is free and it would scale much better than doing it in a canister.

If you serve/compute range requests from a canister, there would have to be multiple canisters storing the same content in order to serve large magnitudes of content, as opposed to just having the chunks cached on the boundary.

1 Like

The boundary nodes are going to cache the entire resource for every canister that range requests are used on? This doesn’t seem like it will scale well…and if this is the solution, why do we have the StreamingStrategy in the asset canister?

The reason for StreamingStrategy is because the responses are limited in size. StreamingStrategy is not a cache breaker because the boundaries are controlled by the canister rather than the client. The problem with Range is that it breaks caching which means it will not scale.

1 Like

In the future we are planning to be able to serve read-only queries geo-replicated from boundary nodes i.e. from the edge. This would allow custom Range logic in a scalable fashion, but we are not there yet.

1 Like

I’ll yield this to you then if partial content can be better implemented elsewhere. Could I get an ETA on this? I would like the functionality for myself and the listeners of Demergence, and there’s one other podcast trying to host on the IC right now.

If we could get boundary nodes to allow those response headers I could at least deploy my solution in the mean time for my canister and others that want the functionality sooner rather than later.

Also, it would have been very ideal to have known this before I put in all of the work for this PR: https://github.com/dfinity/cdk-rs/pull/199

How can we avoid this situation in the future? This issue has been open for a few months now: https://github.com/dfinity/certified-assets/issues/10 and the bounty mentioned specifically adding the functionality to the certified-assets canister: https://twitter.com/dominic_w/status/1467144071449915395

Sorry, I just found out about this. It is also not clear to me how you can do certified Range queries at all from the canister. You would have to return the certified chunks in full to the service worker and icx-proxy and have them form the range response rather than sending it from the canister directly, so in any case you would not be able to serve directly from the canister. Your PR doesn’t seem to handle that so it would not work in production AFAICT

1 Like

It’s okay, I think in the future if we have bounties like this the developer should meet with the team to discuss first, or have someone sign off on the design in the proposed issue. Perhaps I should have been more proactive in gathering opinions from the team.

It should work fine in production as long as you use raw.ic0.app, which is what I’ve been doing for my audio anyway (just for the audio URLs, you can still serve a certified app from ic0.app, but any audio or video URLs would need to be served from raw.ic0.app). It’s a trade-off but it’s working just great for my audio now (you just have to download the file in your podcast player), and I don’t think the security is that necessary for streaming audio and video.

The same types of issues with streaming and certification are discussed here: Is it possible to make raw.ic0.app calls from ic0.app and reverse?

I’d much rather just use raw.ic0.app for audio and video files and have the much better streaming experience than wait for certification to get figured out.

Not to mention I foresee canisters wanting to provide their own http response headers, and I’m hoping we can allow those. I think it’s quite likely someone will run into this limitation in the future and why not address it sooner rather than later.

1 Like

If you would like, we can talk about a design.

If we can return arbitrary http response headers from our canisters then I think most of my issues are resolved. Is this the design you’re talking about?

What’s the reason for the whitelist? I assume that’s the only thing blocking my canister from working…then I can wait for your much more robust and scalable partial content implementation.

The whitelist exists because of security concerns. However, I am looking into a opt out solution which might be a flexible way to address this situation as well as others. That will have security implications as does providing non-certified asset results and the important thing is to make sure that those implications are well understood by all parties.

1 Like