Progressive SHA256?

@timo @quint

I’ve seen a couple of hash libraries and they were both written and adapted by you all, so I’m sorry for the tag, but I think you all are most qualified to answer.

If I do:

let h = sha256();
h.sum(2048000 bytes);
h.sum(2048000 bytes);
h.sum(2048000 bytes);
h.sum(4567 bytes);

Will my hash be the same as if I did

let h = sha256();
h.sum(8196567 bytes);

Where the 8196567 are the same in both instances(just chunked in the first). If not, is there a certain number of bytes that chunking would produce the same hash(I’d imagine some 2^x).

Use case: As files come into a canister in chunks I want to keep track of both the hash of each chunk and a hash of the whole file.

1 Like

Answering my own question here:

It looks like the answer is yes.

let z = SHA.New();
    z.write(gb3);
    let result2 = z.sum([]);

and 

let h = SHA.New();
    h.write(gb1);
    h.write(gb1);
    h.write(gb1a);
    let result = h.sum([]);

Give the same hash if write them out.

The issue is that I’m only able to write 64000 bytes at time with the current aviate-labs SHA256 function. I’m not sure if @timo 's that is hanging around is faster or not. It looks like I can do 128,000 bytes per round if I hold the sha accumulator in memory and write and sum each time(see progressive_write_sum).

This needs to be better. It looks like the current one uses a bunch of array functions…maybe those can be switched to buffer?

I’m assuming this works much better in rust as the asset canister hashes the items as they come in and adds them to the certified asset tree(unless it is just trusting the hash passed in and relying on the service worker to invalidate it on the way out).

Can anyone that understands the asset canisters confirm or deny? I’m trying to sha256 large files coming into the Origyn_nft so that we can certify them on the way out.

1 Like

In the effort to back up a stable memory (comprising of chunks of 1024 WASM pages), I hit execution limit around 20ish WASM pages for computing the sha256. So I use a “loop” through timer iterating 16 WASM pages at a time (WASM page size = 65536). So I can do roughly 1048576 bytes per round give or take … in rust.

The Cargo dep is
sha2 = “0.10.6”

1 Like

Yes, but once you have called h.sum you can’t call h.write again because the state has changed (“is finalized”).

If you are thinking to get both hashes, chunk and whole, for free (i.e. at once without doubling the work) then it won’t work. But I don’t fully understand the requirements of your use case, so can’t tell if there’s a work-around.

You mean because of the cycle limit or some other limit of the library? In terms of cycles you should be able to get around ~2MB hashed in one round.

I see crypto.mo/SHA2.mo at main · aviate-labs/crypto.mo · GitHub has the same quadratic complexity bug that motoko-sha/SHA256.mo at master · enzoh/motoko-sha · GitHub has, from which it was probably derived.

The problem is in this line: crypto.mo/SHA2.mo at daddf4e03e2f224f2b27b1119f95c4791364767f · aviate-labs/crypto.mo · GitHub
where the majority of the whole message to be hashed gets copied. Instead of copying arrays around it has to work by moving pointer to positions in the original array.

A fix is here: Timo/fix quadr complexity by timohanke · Pull Request #11 · enzoh/motoko-sha · GitHub

1 Like

Great data point. I’ll have to take a look and see if there is any optimization that can be done.

Thank you! I’ll do a pull to @quint 's library and I’ll see how the performance improves.

There was also a performance hit on the writes as it was using Array.tabulate. Basically don’t do that.:joy:

There are likely more updates to take out more Array.tabluate that would be helpful here, especially for the HMAC hash as I didn’t do much there.

I can now write up to 12 blocks of 2MB data into a Hash object and then call sum and it generally seems to always return(up to 144 blocks tested).

The test pass, but they don’t check much past a few bytes of data and I’d recommend a test that checks the validity of the has for a very large file > 2MB.

I’ve created a pull request: Performance Improvements for write and sum for SHA2 by skilesare · Pull Request #2 · aviate-labs/crypto.mo · GitHub

You mean you can hash 24 MB in one message? With DTS, or?

I assume DTS is running on Motoko playground. I can write 24MB into a Hash Handling object in one round. I do the actual sum in the another function. See the motoko playground link and these functions:


public shared func store4() : async (){
    var tracker = 0;
    var subtracker : Nat8 = 0;
    let b1 = Buffer.Buffer<Nat8>(1);

    while(tracker < 2048000){
      b1.add(subtracker);
      if(subtracker == 255) subtracker := 0;
      subtracker += 1;
      tracker += 1;
    };

    gbPrgressive := b1.toArray();
  };
var continual_sum = SHA.New();
  var progressive_tracker = 0;
  
  public shared func progressive_write(number : Nat) : async Nat{
    for(i in Iter.range(0,number -1)){
      continual_sum.write(gbPrgressive); //add 2 more MB.
      progressive_tracker += 1;
    };
    progressive_tracker
  };

  public shared func progressive_sum() : async [Nat8]{
    let result = continual_sum.sum([]); //get the hash
    continual_sum := SHA.New();
    progressive_tracker := 0;
    result;
  };

To simulate goto Motoko Playground - DFINITY , deploy, and then call store4, then progressive_write, then progressive_sum.

It is possible I’ve completely bumbled this. But the tests are passing up to some non-trivial number of bytes…it would be great to have a large test added.

I have written a new cycle-optimized Sha2 library. You can find it in here: GitHub - research-ag/motoko-lib: Motoko general purpose libraries

Benchmark will follow.

4 Likes

Benchmarked were these libraries:

  1. research-ag/motoko-lib
  2. enzoh/motoko-sha
  3. aviate-labs/crypto.mo
  4. timohanke/motoko-sha2

Since the original 2. and 3. still have the quadratic complexity bug in them I used these PRs for them:

  1. Timo/fix quadr complexity by timohanke · Pull Request #11 · enzoh/motoko-sha · GitHub
  2. Performance Improvements for write and sum for SHA2 by skilesare · Pull Request #2 · aviate-labs/crypto.mo · GitHub

The cycle cost of sha256 per block (64 bytes) based on the average taken over 1,000 blocks is:

research-ag/motoko-lib  34.6k
enzoh/motoko-sha        51.9k
aviate-labs/crypto.mo   48.2k
timohanke/motoko-sha2   49.8k

The cycle cost for the empty message (demonstrating per-message overhead) is:

research-ag/motoko-lib  37k
enzoh/motoko-sha        96k
aviate-labs/crypto.mo   107k
timohanke/motoko-sha2   503k

The cycle cost of sha512 per block (128 bytes) based on the average taken over 1,000 blocks is:

research-ag/motoko-lib  53.9k

The numbers for motoko-lib translate to 541 cycles/byte for sha256 and 421 cycles/byte for sha512.

5 Likes