Optimal upload chunk size

peterparker · June 4, 2023, 6:36pm

For those who have implemented file storage using chunked uploads in a canister, what chunk size have you defined for your uploads?

Currently, I have set my chunk size to 700kb, but, while I know there is a limit, I have a feeling that increasing the value could potentially improve performance. I’m curious if anyone has conducted experiments to determine the optimal chunk size and would appreciate any insights or recommendations on this matter.

const chunkSize = 700000; // <-----I'm looking to find the best upper size

  for (let start = 0; start < data.size; start += chunkSize) {
    const chunk: Blob = data.slice(start, start + chunkSize);

    await uploadChunk({
        batchId,
        chunk,
        actor
      });
  }

Zane · June 4, 2023, 7:05pm

The limit for ingress messages is 2Mb iirc so that is what I use for chunking.

peterparker · June 4, 2023, 7:18pm

You set exactly 2mb or slightly below the threshold?

Zane · June 4, 2023, 7:23pm

Exactly 2Mb. wordcount

peterparker · June 4, 2023, 7:25pm

Coolio. I’ll give it a try, thanks for the feedback!

alejandrade · June 5, 2023, 1:24am

Some one should just make an NPM package for this and this question never needs to be asked again

peterparker · June 5, 2023, 6:05am

I would not use an npm package for this I have to say. Generally speaking, I avoid third-party JavaScript dependencies as much as possible when they are not necessary.

On the contrary, I have developed a JS library for Juno (https://github.com/buildwithjuno/juno-js) where I should improve this chunk size .

Severin · June 5, 2023, 6:32am

In the asset canister sync code we use 1900000 bytes per chunk. Seems to be working all right

peterparker · June 5, 2023, 6:40am

Thanks Severin. In the asset canister, do you parallelize the upload or you proceed these one after the other?

Severin · June 5, 2023, 6:41am

On a quick glance I don’t see any parallelisation code, but it should be fine to split add some

peterparker · June 5, 2023, 6:45am

Thanks for the feedback. I also don’t but, since I plan to have a look to improve the performance, was thinking about parallelizing too. Will have a look.

famouscat8 · June 5, 2023, 7:03am

with 2MiB limit, it took my 1+hours upload about 2GiB data into local replica!!

Severin · June 5, 2023, 7:05am

That sounds very slow, even with 2 seconds per call it should not take that long…
If you want faster local operations you can run dfx start with --artificial-delay 10 which should speed this up massively. By default it emulates the consensus time of a subnet, which is around 1s. With that flag it should be ~100x faster

domwoe · June 5, 2023, 8:46am

@yrgg Maybe you’ve done some experiments related to this?

mnl · June 5, 2023, 12:03pm

ic-asset does parallelize the upload to the asset canister, see sdk/semaphores.rs at master · dfinity/sdk · GitHub

famouscat8 · June 5, 2023, 12:21pm

Extremely low efficience uploading large file to canister. can we make a proposition: let sha256(data) going through consensus and then directly upload and store the large size data into canister?

Severin · June 5, 2023, 12:29pm

As @mnl already said there is some parallelisation happening already. sha256(data) won’t work for files larger than some size because the hashing will take too many instructions for a single call

yrgg · June 5, 2023, 12:51pm

Yeah we’ve messed around with this a little bit.

We uploaded this 3:36:46 video https://app.portal.one/watch/DFINITY/d337d189-64f6-40cd-9c30-735709aa06db the other week and iirc it was about 10GB total. That’s four separate resolutions (1080, 720, 480, and 360) with each resolution split into 6,504 individual 2 second long chunks. The entire upload for all of that was around 3:30:00 (about T1) meaning the upload time was about the same as the length of the video.

We upload to asset canisters using a node.js server per resolution, with each resolution being sharded across 5 different canisters, and we throttle the number of concurrent calls per canister to 10 so as not to fill up the subnet/canisters ingest queue. In addition we set our max chunk size to 1.8MB (1_887_436 bytes).

We also spawn the asset canisters on the lowest capacity subnets currently available. T1 is a pretty decent upload speed and good enough for livestreaming if you wanted to.

https://p5qyc-gaaaa-aaaai-qa6yq-cai.raw.ic0.app/?playlist=https://kt3ak-tqaaa-aaaap-qbdea-cai.raw.icp0.io/watch/d337d189-64f6-40cd-9c30-735709aa06db

h1teshtr1path1 · June 5, 2023, 10:07pm

Just chiming in, When we upload assets to an asset canister, those calls don’t undergo consensus, right? Or do they?

As I have also worked on something similar, where I made a motoko asset canister implementation, but for uploading a 500mb file, it takes more than half hr. And I assume there is no way to optimize it in a motoko asset canister, as all chunks would get uploaded one after others! Am I missing something? Is there any way to optimise that?

Severin · June 6, 2023, 6:43am

Uploading (read: changing the state of the canister) goes through consensus. http_request is don as a non-replicated query call and doesn’t need consensus.

This is not something the asset canister itself controls. It’s a matter of what the uploader does. The receiver can’t control the speed at which the uploader sends chunks.

The uploader can send multiple calls at once. The messages will be processed serially, but consensus can schedule multiple messages per round, and IIRC a lot of canisters should easily be able to handle O(100) messages per second

Topic		Replies	Views
Upload asset chunk to canister size limit? Developers	3	643	October 5, 2021
Canister and blob size limits? Getting Started	6	1968	April 22, 2022
Canister size >5mb Developers	4	433	August 18, 2021
Why asset canister limit 1GB instead of 2GB? Developers	5	289	December 14, 2023
Chunk Upload Asset Canister Developers Frontend	5	700	February 27, 2023

Optimal upload chunk size

Related topics