We have published an optimized Motoko implementation of all SHA2 functions: https://mops.one/sha2
The package provides sha256, sha224, sha512, sha384, sha512-256, sha512-224.
The possible input types are Blob, [Nat8] and Iter<Nat8>.
The most important performance metric is the number of wasm instructions used per block (aka chunk) of the input message. For sha256 this has been reduced by a factor of at least 2.5x compared to other existing implementations. For long messages of type Blob, sha256 now uses 295 instructions per byte hashed.
When hashing small messages the per-message overhead is important as well. This can be measured for example by comparing the instructions needed to hash the empty message. In this metric we see a reduction of 5.2x for sha256.
Finally, the amount of garbage created by heap allocations can be important. We measured that previous implementations of sha256 created at least 6x as much garbage as the size of the input message. This is down to 1.5x now.
Would it be possible to add an incremental mode that lets you process chunks across rounds? I’m contemplating needing to hash a very large file, or even an entire directory of files to get a hash.
Yes, it has that. It has the typical interface with a Digest class where you can do multiple writes to the Digest and then ask for the sum at the end. You can write types Blob, [Nat8] or from an Iter<Nat8> and you can mix types across multiple writes.
The library has been updated for use with moc 0.9.8 by taking advantage of the NatX conversions between adjacent X (e.g. Nat32 ↔ Nat16 ↔ Nat8).
This brought a decrease in instructions of 3% across all functions, Sha256 and Sha512.
Most notably, the conversions made it worthwhile to store state and message data in Nat16 words instead of Nat32 words. This then allowed us, for Sha256 at least, to eliminate all heap allocations. Indeed, heap allocation (i.e. garbage creation) is now independent of the message size. We can hash from input type Blob of arbitrary length with a constant heap allocation of 1,008 bytes (for instantiating a class). This change then allowed a further reduction in instructions for Sha256 of 4%.
The new version is 0.0.4. Compared to 0.0.2 we see a total decrease in instructions per byte of 7% for Sha256 and 20% for the empty message.