I look forward to hearing more about the opt out solution, thanks for the discussion today!
Thank you! Looking forward to a suite of solutions for all use-cases.
This sounds like what I was proposing here, but for uploads:
I think the code from the certified assets crate that
dfxuses would need to be used by
icx-proxyto enable arbitrary-sized file uploads from other clients in general.
For web-based apps that probably needs changes to the service worker as well.
I was at least going to be advocating for this if not attempting it myself in the near future, assuming it was the right way to go about it.
This is interesting: https://github.com/dfinity/certified-assets/issues/10#issuecomment-1034287222
According to that little experiment there is no response header filtering occurring. So is there a whitelist? Is it on the request headers or the response headers? And I would like to know more about the security reasons for the whitelists, especially if it’s on the response headers. What security risks do they present? I’m doubting the security risks possible with response headers and would love to understand more.
I would still say this can be accomplished now with the Rust agent (that’s what icx-proxy and I believe dfx use), and thus any agent could implement this. No changes are necessary to the asset canister or the boundary nodes.
But if we want a general solution, then arbitrary size ingress messages I think is the best path forward (which is on the tentative Foundation roadmap). I’m not sure this has much to do with the boundary nodes, unless they are part of the arbitrary size ingress messages solution.
Uploads aren’t cached and don’t need to be scaled out to many requesting clients, so I think it’s quite a different issue.
@nomeata are you volunteering?
I don’t know enough about this proposal yet but my concerns with this alone would be the amount of memory involved and/or how much work can be done in a single block.
Yes I believe they are addressing the block issues with deterministic time slicing: https://dfinity.org/roadmap/?m=chromium
"ETA: Q1 2022
Currently, the amount of computation a canister can perform per call is limited by the block time. Deterministic time slicing allows for long(er) running, multi-round computations by suspending the execution at the end of one round and resuming it later."
My concern with that is:
It seems like that could be avoided with streaming uploads.
So I’ve been doing some tests in production, and I’m not finding any evidence of an http response header whitelist. I am able to return a
206 status code,
Content-Range header, and even a custom header called
But I don’t think my
Range http request header is coming through, but not sure yet. It makes more sense to me to have an http request header whitelist than an http response header whitelist, so this is making sense.
I think I’ve tracked down the final issue: The
Range http request header is being filtered out before it reaches the canister. I’ve just tested this with a simplified canister in production.
@jplevyak it looks like I just need the
Range request header to not be filtered out. Can we make an exception for the
Range header and not filter it out?
Range queries can’t be certified, so it doesn’t make sense to try to handle them in the “certified assets” canister. We can have some way of certifying opt-out of certification in which case they might make sense, although as I said above they would be uncacheable which makes them unscalable and thus less widely useful. I will talk to the security and other folks to see if we can pass the Range header to the backend. We have an automatic chunking system for uploads and downloads and we can use that for reading certified chunks in the service worker and icx-proxy and constructing the Range response seems like the most promising path and the most in keeping with the spirit of the twitter request as certification is an important security feature. Currently the chunk size is set to a constant 1_900_000 and generally we probably want to have this as metadata, but using that constant should be fine for now. Using the get_chunk() query method of the existing certified assets canister would allow those calls to be cached as well which would scale well to many clients once we can set the cache control headers (we are working on that). The net is that I think the work for this feature is actually in the service worker and icx proxy.
Makes sense. I would like the Range header to deploy my own canister that myself and others want sooner rather than later. I’ll just maintain a fork of the certified-assets canister until the better solution is implemented.
I would keep in mind that certification isn’t always necessary/desired, and the asset canister will be useful even without certification. I’ve viewed the certified-assets canister more as just an assets canister, and certification is a nice added feature, but isn’t appropriate for all use cases.
For example in podcasting audio files are hosted anywhere on the Internet, and the RSS feed is what podcast players consume to fetch the audio on client devices. If we force podcast audio to be certified in the client then it would make the IC a bad place to host podcasts because then most podcast players would not be able to serve the audio.
I agree that we need a way to opt out of certification for things like dynamically generated content, relays, incompatible clients etc. The podcast players could go through an ixc-proxy translation layer e.g. raw.ic0.app which could read the certified data and format it for non-service worker targets e.g. standalone podcast players after verifying the certification. Would that work for the use-case of the podcast players you are talking about? We are trying to balance security and useability and I am trying to probe the edges of that balance.
The whole point of the advanced video/audio streaming is that it is certified on the blockchain. Anyone can make setup uncertified streaming now without the certified-assets canister. It is possible to set up audio/video streaming with the certification now through the certified assets canister By this method:
I believe that would work for the podcasting use case if there is a URL exposed for non-certified uses.
.raw has been working well for me so far.
I don’t think the discussion is about how to implement the streaming anymore, it seems clear it’s best done in a much more scalable/secure manner outside of the assets canister.
Unfortunately it looks like we’ll have to wait a while to get this working, so I just want the boundary nodes to not filter out Range headers since my fork of the certified-assets canister has an implementation of partial responses that will work for certain use cases.
You still need to implement partial responses, my solution does this from a canister.
I don’t care where it’s done (except I want the best solution possible) it just needs to be done. I have a working canister implementation that myself and others would benefit from.
Again, I just need Range headers. Just ignore my fork of the assets canister, it’s just there temporarily for whoever wants to use it.
There are certification schemes that support that, using suitable hash functions, and we did consider them back then before we settled on this certification MVP because we needed something simple. But I wouldn’t say it’s impossible, we could extend our protocol here. See, e.g.
We do - that’s the
raw.ic0 URL, isn’t it? And there I would have indeed expected headers to be passed through.