For those who have implemented file storage using chunked uploads in a canister, what chunk size have you defined for your uploads?
Currently, I have set my chunk size to 700kb, but, while I know there is a limit, I have a feeling that increasing the value could potentially improve performance. I’m curious if anyone has conducted experiments to determine the optimal chunk size and would appreciate any insights or recommendations on this matter.
const chunkSize = 700000; // <-----I'm looking to find the best upper size
for (let start = 0; start < data.size; start += chunkSize) {
const chunk: Blob = data.slice(start, start + chunkSize);
await uploadChunk({
batchId,
chunk,
actor
});
}
I would not use an npm package for this I have to say. Generally speaking, I avoid third-party JavaScript dependencies as much as possible when they are not necessary.
Thanks for the feedback. I also don’t but, since I plan to have a look to improve the performance, was thinking about parallelizing too. Will have a look.
That sounds very slow, even with 2 seconds per call it should not take that long…
If you want faster local operations you can run dfx start with --artificial-delay 10 which should speed this up massively. By default it emulates the consensus time of a subnet, which is around 1s. With that flag it should be ~100x faster
Extremely low efficience uploading large file to canister. can we make a proposition: let sha256(data) going through consensus and then directly upload and store the large size data into canister?
As @mnl already said there is some parallelisation happening already. sha256(data) won’t work for files larger than some size because the hashing will take too many instructions for a single call
We uploaded this 3:36:46 video https://app.portal.one/watch/DFINITY/d337d189-64f6-40cd-9c30-735709aa06db the other week and iirc it was about 10GB total. That’s four separate resolutions (1080, 720, 480, and 360) with each resolution split into 6,504 individual 2 second long chunks. The entire upload for all of that was around 3:30:00 (about T1) meaning the upload time was about the same as the length of the video.
We upload to asset canisters using a node.js server per resolution, with each resolution being sharded across 5 different canisters, and we throttle the number of concurrent calls per canister to 10 so as not to fill up the subnet/canisters ingest queue. In addition we set our max chunk size to 1.8MB (1_887_436 bytes).
We also spawn the asset canisters on the lowest capacity subnets currently available. T1 is a pretty decent upload speed and good enough for livestreaming if you wanted to.
Just chiming in, When we upload assets to an asset canister, those calls don’t undergo consensus, right? Or do they?
As I have also worked on something similar, where I made a motoko asset canister implementation, but for uploading a 500mb file, it takes more than half hr. And I assume there is no way to optimize it in a motoko asset canister, as all chunks would get uploaded one after others! Am I missing something? Is there any way to optimise that?
Uploading (read: changing the state of the canister) goes through consensus. http_request is don as a non-replicated query call and doesn’t need consensus.
This is not something the asset canister itself controls. It’s a matter of what the uploader does. The receiver can’t control the speed at which the uploader sends chunks.
The uploader can send multiple calls at once. The messages will be processed serially, but consensus can schedule multiple messages per round, and IIRC a lot of canisters should easily be able to handle O(100) messages per second