Clearing up confusion over dynamically storing & serving large assets

I’m sure more and more of us will need to dynamically store media assets like mp4s and jpegs in IC canisters. (For example, users upload photos… for large static assets, just use the asset canister.) But there are two relevant and huge limitations of the current IC replica implementation:

  1. Request and response messages are limited to ~2 MB
  2. Canister heap memory is limited to 4 GB (there is a proposal to increase this to 8 GB for stable memory)

I’ve been tripped up over how to deal with this, so I’ve written a short doc on what I’ve learned…

To deal with 1, you need to figure out how to put your asset into the canister (the “put”) and how to get asset back out (the “get”).

In terms of the put, you need to divide up your asset blob into chunks in the frontend, and then repeatedly call a “putChunk” canister method to save the chunk in some data structure like a Motoko array. There’s no way around this. The dfx SDK does this chunking under the hood when you upload static assets to an asset canister at deploy time. However, there’s no official JS library for this, so you’ll have to roll your own.

In terms of the get, you need to first decide whether you want to fetch the asset using plain old HTTP (which can used in HTML directly like in <img src="...">) or through canister RPC calls using the JS agent.

Fetching with plain old HTTP looks like this:

Whereas fetching with RPC calls looks like this:

Benefits of plain old HTTP:

  • can embed asset URLs directly in HTML or wherever HTTP URLs can be used
  • leverages existing HTTP infra like image caching in browsers
  • simpler to implement on the frontend (no need for stitching together the chunks, although you already need chunking logic regardless for the put)

Benefits of RPC calls:

  • simpler to implement on the backend (no need for streaming callbacks with tokens)
  • technically, get_chunk requests can be made in parallel, which is not the case for HTTP
  • easier to test locally (no need to run icx-proxy locally)

If you decide to go the plain old HTTP route, you can reuse the asset canister source or use it for inspiration when building your own canister. Personally, it seems like plain old HTTP is the way to go…

Regardless with how you dealt with 1, you still need to deal with 2, which is that a canister has bounded memory. Until inter-canister queries are available, you’ll likely need to dynamically spin up new storage canisters as existing ones fill up. The flow would be something like: 1) query a registry canister to get the canister ID of the desired asset (or chunk), 2) query the canister ID (using either plain old HTTP or RPC calls as described before). This is a good example of how to spin up new canisters on demand.

Does this sound right?