I’ve seen a couple of hash libraries and they were both written and adapted by you all, so I’m sorry for the tag, but I think you all are most qualified to answer.
If I do:
let h = sha256();
h.sum(2048000 bytes);
h.sum(2048000 bytes);
h.sum(2048000 bytes);
h.sum(4567 bytes);
Will my hash be the same as if I did
let h = sha256();
h.sum(8196567 bytes);
Where the 8196567 are the same in both instances(just chunked in the first). If not, is there a certain number of bytes that chunking would produce the same hash(I’d imagine some 2^x).
Use case: As files come into a canister in chunks I want to keep track of both the hash of each chunk and a hash of the whole file.
let z = SHA.New();
z.write(gb3);
let result2 = z.sum([]);
and
let h = SHA.New();
h.write(gb1);
h.write(gb1);
h.write(gb1a);
let result = h.sum([]);
Give the same hash if write them out.
The issue is that I’m only able to write 64000 bytes at time with the current aviate-labs SHA256 function. I’m not sure if @timo 's that is hanging around is faster or not. It looks like I can do 128,000 bytes per round if I hold the sha accumulator in memory and write and sum each time(see progressive_write_sum).
This needs to be better. It looks like the current one uses a bunch of array functions…maybe those can be switched to buffer?
I’m assuming this works much better in rust as the asset canister hashes the items as they come in and adds them to the certified asset tree(unless it is just trusting the hash passed in and relying on the service worker to invalidate it on the way out).
Can anyone that understands the asset canisters confirm or deny? I’m trying to sha256 large files coming into the Origyn_nft so that we can certify them on the way out.
In the effort to back up a stable memory (comprising of chunks of 1024 WASM pages), I hit execution limit around 20ish WASM pages for computing the sha256. So I use a “loop” through timer iterating 16 WASM pages at a time (WASM page size = 65536). So I can do roughly 1048576 bytes per round give or take … in rust.
Yes, but once you have called h.sum you can’t call h.write again because the state has changed (“is finalized”).
If you are thinking to get both hashes, chunk and whole, for free (i.e. at once without doubling the work) then it won’t work. But I don’t fully understand the requirements of your use case, so can’t tell if there’s a work-around.
You mean because of the cycle limit or some other limit of the library? In terms of cycles you should be able to get around ~2MB hashed in one round.
There was also a performance hit on the writes as it was using Array.tabulate. Basically don’t do that.
There are likely more updates to take out more Array.tabluate that would be helpful here, especially for the HMAC hash as I didn’t do much there.
I can now write up to 12 blocks of 2MB data into a Hash object and then call sum and it generally seems to always return(up to 144 blocks tested).
The test pass, but they don’t check much past a few bytes of data and I’d recommend a test that checks the validity of the has for a very large file > 2MB.
I assume DTS is running on Motoko playground. I can write 24MB into a Hash Handling object in one round. I do the actual sum in the another function. See the motoko playground link and these functions:
public shared func store4() : async (){
var tracker = 0;
var subtracker : Nat8 = 0;
let b1 = Buffer.Buffer<Nat8>(1);
while(tracker < 2048000){
b1.add(subtracker);
if(subtracker == 255) subtracker := 0;
subtracker += 1;
tracker += 1;
};
gbPrgressive := b1.toArray();
};
var continual_sum = SHA.New();
var progressive_tracker = 0;
public shared func progressive_write(number : Nat) : async Nat{
for(i in Iter.range(0,number -1)){
continual_sum.write(gbPrgressive); //add 2 more MB.
progressive_tracker += 1;
};
progressive_tracker
};
public shared func progressive_sum() : async [Nat8]{
let result = continual_sum.sum([]); //get the hash
continual_sum := SHA.New();
progressive_tracker := 0;
result;
};
To simulate goto Motoko Playground - DFINITY , deploy, and then call store4, then progressive_write, then progressive_sum.
It is possible I’ve completely bumbled this. But the tests are passing up to some non-trivial number of bytes…it would be great to have a large test added.