Boundary node http response headers

lastmjs · February 9, 2022, 9:56pm

I look forward to hearing more about the opt out solution, thanks for the discussion today!

jplevyak · February 9, 2022, 10:01pm

Thank you! Looking forward to a suite of solutions for all use-cases.

paulyoung · February 9, 2022, 10:29pm

This sounds like what I was proposing here, but for uploads:

github.com/dfinity/certified-assets

Advanced Video Streaming aka HTTP Range Requests

opened 04:24PM - 04 Dec 21 UTC

lastmjs

I'm attempting to do the work to claim this bounty: https://twitter.com/dominic_…w/status/1467144071449915395 It seems like implementing HTTP Range request functionality will achieve video streaming, and beyond that audio streaming and really any kind of file streaming. I'm not exactly sure what is in scope for this bounty, but I hope to receive guidance on what is acceptable along the way. Tentatively I'll be following this guide: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests I'm not sure how much of it needs to be implemented, as some functionality might not be necessary to have excellent video streaming from most clients/browsers # Required functionality - [x] Range header - [x] 206 partial response - [x] Content-Range header - [x] Content-Length header - [x] 416 response # Possibly optional - [x] Accept-Ranges header - [ ] HEAD requests - [ ] Multipart ranges - [ ] If-Range header - [ ] Certification - [ ] Protection against denial of service (range numbers that are too big, too many ranges)

Specifically:

I think the code from the certified assets crate that dfx uses would need to be used by icx-proxy to enable arbitrary-sized file uploads from other clients in general.

For web-based apps that probably needs changes to the service worker as well.

I was at least going to be advocating for this if not attempting it myself in the near future, assuming it was the right way to go about it.

paulyoung · February 9, 2022, 10:38pm

@jplevyak @lastmjs I’m mostly interested in uploads but they seem to be two sides of the same coin so I’d be happy to be involved in any design discussions

lastmjs · February 10, 2022, 3:19pm

This is interesting: https://github.com/dfinity/certified-assets/issues/10#issuecomment-1034287222

According to that little experiment there is no response header filtering occurring. So is there a whitelist? Is it on the request headers or the response headers? And I would like to know more about the security reasons for the whitelists, especially if it’s on the response headers. What security risks do they present? I’m doubting the security risks possible with response headers and would love to understand more.

lastmjs · February 10, 2022, 3:24pm

I would still say this can be accomplished now with the Rust agent (that’s what icx-proxy and I believe dfx use), and thus any agent could implement this. No changes are necessary to the asset canister or the boundary nodes.

But if we want a general solution, then arbitrary size ingress messages I think is the best path forward (which is on the tentative Foundation roadmap). I’m not sure this has much to do with the boundary nodes, unless they are part of the arbitrary size ingress messages solution.

Uploads aren’t cached and don’t need to be scaled out to many requesting clients, so I think it’s quite a different issue.

paulyoung · February 10, 2022, 3:24pm

@nomeata are you volunteering?

paulyoung · February 10, 2022, 3:28pm

I don’t know enough about this proposal yet but my concerns with this alone would be the amount of memory involved and/or how much work can be done in a single block.

lastmjs · February 10, 2022, 3:47pm

Yes I believe they are addressing the block issues with deterministic time slicing: https://dfinity.org/roadmap/?m=chromium

"ETA: Q1 2022

Currently, the amount of computation a canister can perform per call is limited by the block time. Deterministic time slicing allows for long(er) running, multi-round computations by suspending the execution at the end of one round and resuming it later."

paulyoung · February 10, 2022, 4:07pm

My concern with that is:

It seems like that could be avoided with streaming uploads.

lastmjs · February 10, 2022, 4:27pm

So I’ve been doing some tests in production, and I’m not finding any evidence of an http response header whitelist. I am able to return a 206 status code, Content-Range header, and even a custom header called Range-Request-Header.

But I don’t think my Range http request header is coming through, but not sure yet. It makes more sense to me to have an http request header whitelist than an http response header whitelist, so this is making sense.

lastmjs · February 10, 2022, 4:32pm

I think I’ve tracked down the final issue: The Range http request header is being filtered out before it reaches the canister. I’ve just tested this with a simplified canister in production.

@jplevyak it looks like I just need the Range request header to not be filtered out. Can we make an exception for the Range header and not filter it out?

jplevyak · February 10, 2022, 8:09pm

Range queries can’t be certified, so it doesn’t make sense to try to handle them in the “certified assets” canister. We can have some way of certifying opt-out of certification in which case they might make sense, although as I said above they would be uncacheable which makes them unscalable and thus less widely useful. I will talk to the security and other folks to see if we can pass the Range header to the backend. We have an automatic chunking system for uploads and downloads and we can use that for reading certified chunks in the service worker and icx-proxy and constructing the Range response seems like the most promising path and the most in keeping with the spirit of the twitter request as certification is an important security feature. Currently the chunk size is set to a constant 1_900_000 and generally we probably want to have this as metadata, but using that constant should be fine for now. Using the get_chunk() query method of the existing certified assets canister would allow those calls to be cached as well which would scale well to many clients once we can set the cache control headers (we are working on that). The net is that I think the work for this feature is actually in the service worker and icx proxy.

lastmjs · February 10, 2022, 8:33pm

Makes sense. I would like the Range header to deploy my own canister that myself and others want sooner rather than later. I’ll just maintain a fork of the certified-assets canister until the better solution is implemented.

I would keep in mind that certification isn’t always necessary/desired, and the asset canister will be useful even without certification. I’ve viewed the certified-assets canister more as just an assets canister, and certification is a nice added feature, but isn’t appropriate for all use cases.

For example in podcasting audio files are hosted anywhere on the Internet, and the RSS feed is what podcast players consume to fetch the audio on client devices. If we force podcast audio to be certified in the client then it would make the IC a bad place to host podcasts because then most podcast players would not be able to serve the audio.

jplevyak · February 10, 2022, 8:48pm

I agree that we need a way to opt out of certification for things like dynamically generated content, relays, incompatible clients etc. The podcast players could go through an ixc-proxy translation layer e.g. raw.ic0.app which could read the certified data and format it for non-service worker targets e.g. standalone podcast players after verifying the certification. Would that work for the use-case of the podcast players you are talking about? We are trying to balance security and useability and I am trying to probe the edges of that balance.

levi · February 10, 2022, 8:50pm

The whole point of the advanced video/audio streaming is that it is certified on the blockchain. Anyone can make setup uncertified streaming now without the certified-assets canister. It is possible to set up audio/video streaming with the certification now through the certified assets canister By this method:

lastmjs · February 10, 2022, 8:51pm

I believe that would work for the podcasting use case if there is a URL exposed for non-certified uses. .raw has been working well for me so far.

lastmjs · February 10, 2022, 8:57pm

I don’t think the discussion is about how to implement the streaming anymore, it seems clear it’s best done in a much more scalable/secure manner outside of the assets canister.

Unfortunately it looks like we’ll have to wait a while to get this working, so I just want the boundary nodes to not filter out Range headers since my fork of the certified-assets canister has an implementation of partial responses that will work for certain use cases.

lastmjs · February 10, 2022, 8:58pm

You still need to implement partial responses, my solution does this from a canister.

I don’t care where it’s done (except I want the best solution possible) it just needs to be done. I have a working canister implementation that myself and others would benefit from.

Again, I just need Range headers. Just ignore my fork of the assets canister, it’s just there temporarily for whoever wants to use it.

nomeata · February 10, 2022, 9:50pm

There are certification schemes that support that, using suitable hash functions, and we did consider them back then before we settled on this certification MVP because we needed something simple. But I wouldn’t say it’s impossible, we could extend our protocol here. See, e.g.

https://tools.ietf.org/search/draft-thomson-http-mice-02

We do - that’s the raw.ic0 URL, isn’t it? And there I would have indeed expected headers to be passed through.

Topic		Replies	Views
Range headers being stripped out Developers	34	1148	September 7, 2024
ERR_HTTP2_PROTOCOL_ERROR for assets on canister Developers Bug	29	504	March 8, 2024
Can't host podcasts from canisters Developers	9	650	February 22, 2024
Can we somehow achieve or build a canister with a interface bahaving like a classic web server Developers	53	14924	May 25, 2021
Service Worker Bug? Body does not pass verification Developers	23	3124	August 4, 2022

Boundary node http response headers

Related topics