Optimal upload chunk size

For those who have implemented file storage using chunked uploads in a canister, what chunk size have you defined for your uploads?

Currently, I have set my chunk size to 700kb, but, while I know there is a limit, I have a feeling that increasing the value could potentially improve performance. I’m curious if anyone has conducted experiments to determine the optimal chunk size and would appreciate any insights or recommendations on this matter.

const chunkSize = 700000; // <-----I'm looking to find the best upper size

  for (let start = 0; start < data.size; start += chunkSize) {
    const chunk: Blob = data.slice(start, start + chunkSize);

    await uploadChunk({
        batchId,
        chunk,
        actor
      });
  }

The limit for ingress messages is 2Mb iirc so that is what I use for chunking.

1 Like

You set exactly 2mb or slightly below the threshold?

Exactly 2Mb. wordcount

1 Like

Coolio. I’ll give it a try, thanks for the feedback!

Some one should just make an NPM package for this and this question never needs to be asked again

1 Like

I would not use an npm package for this I have to say. Generally speaking, I avoid third-party JavaScript dependencies as much as possible when they are not necessary.

On the contrary, I have developed a JS library for Juno (https://github.com/buildwithjuno/juno-js) where I should improve this chunk size :wink:.

In the asset canister sync code we use 1900000 bytes per chunk. Seems to be working all right

1 Like

Thanks Severin. In the asset canister, do you parallelize the upload or you proceed these one after the other?

On a quick glance I don’t see any parallelisation code, but it should be fine to split add some

Thanks for the feedback. I also don’t but, since I plan to have a look to improve the performance, was thinking about parallelizing too. Will have a look.

with 2MiB limit, it took my 1+hours upload about 2GiB data into local replica!!

That sounds very slow, even with 2 seconds per call it should not take that long…
If you want faster local operations you can run dfx start with --artificial-delay 10 which should speed this up massively. By default it emulates the consensus time of a subnet, which is around 1s. With that flag it should be ~100x faster

1 Like

@yrgg Maybe you’ve done some experiments related to this?

ic-asset does parallelize the upload to the asset canister, see sdk/semaphores.rs at master · dfinity/sdk · GitHub

2 Likes

Extremely low efficience uploading large file to canister. can we make a proposition: let sha256(data) going through consensus and then directly upload and store the large size data into canister?

As @mnl already said there is some parallelisation happening already. sha256(data) won’t work for files larger than some size because the hashing will take too many instructions for a single call

Yeah we’ve messed around with this a little bit.

We uploaded this 3:36:46 video https://app.portal.one/watch/DFINITY/d337d189-64f6-40cd-9c30-735709aa06db the other week and iirc it was about 10GB total. That’s four separate resolutions (1080, 720, 480, and 360) with each resolution split into 6,504 individual 2 second long chunks. The entire upload for all of that was around 3:30:00 (about T1) meaning the upload time was about the same as the length of the video.

We upload to asset canisters using a node.js server per resolution, with each resolution being sharded across 5 different canisters, and we throttle the number of concurrent calls per canister to 10 so as not to fill up the subnet/canisters ingest queue. In addition we set our max chunk size to 1.8MB (1_887_436 bytes).

We also spawn the asset canisters on the lowest capacity subnets currently available. T1 is a pretty decent upload speed and good enough for livestreaming if you wanted to.

https://p5qyc-gaaaa-aaaai-qa6yq-cai.raw.ic0.app/?playlist=https://kt3ak-tqaaa-aaaap-qbdea-cai.raw.icp0.io/watch/d337d189-64f6-40cd-9c30-735709aa06db

4 Likes

Just chiming in, When we upload assets to an asset canister, those calls don’t undergo consensus, right? Or do they?

As I have also worked on something similar, where I made a motoko asset canister implementation, but for uploading a 500mb file, it takes more than half hr. And I assume there is no way to optimize it in a motoko asset canister, as all chunks would get uploaded one after others! Am I missing something? Is there any way to optimise that?

Uploading (read: changing the state of the canister) goes through consensus. http_request is don as a non-replicated query call and doesn’t need consensus.

This is not something the asset canister itself controls. It’s a matter of what the uploader does. The receiver can’t control the speed at which the uploader sends chunks.

The uploader can send multiple calls at once. The messages will be processed serially, but consensus can schedule multiple messages per round, and IIRC a lot of canisters should easily be able to handle O(100) messages per second