We have published an optimized Motoko implementation of all SHA2 functions: https://mops.one/sha2
The package provides sha256, sha224, sha512, sha384, sha512-256, sha512-224.
The possible input types are Blob, [Nat8] and Iter<Nat8>.
The most important performance metric is the number of wasm instructions used per block (aka chunk) of the input message. For sha256 this has been reduced by a factor of at least 2.5x compared to other existing implementations. For long messages of type Blob, sha256 now uses 295 instructions per byte hashed.
When hashing small messages the per-message overhead is important as well. This can be measured for example by comparing the instructions needed to hash the empty message. In this metric we see a reduction of 5.2x for sha256.
Finally, the amount of garbage created by heap allocations can be important. We measured that previous implementations of sha256 created at least 6x as much garbage as the size of the input message. This is down to 1.5x now.
Would it be possible to add an incremental mode that lets you process chunks across rounds? I’m contemplating needing to hash a very large file, or even an entire directory of files to get a hash.
Yes, it has that. It has the typical interface with a Digest class where you can do multiple writes to the Digest and then ask for the sum at the end. You can write types Blob, [Nat8] or from an Iter<Nat8> and you can mix types across multiple writes.
The library has been updated for use with moc 0.9.8 by taking advantage of the NatX conversions between adjacent X (e.g. Nat32 ↔ Nat16 ↔ Nat8).
This brought a decrease in instructions of 3% across all functions, Sha256 and Sha512.
Most notably, the conversions made it worthwhile to store state and message data in Nat16 words instead of Nat32 words. This then allowed us, for Sha256 at least, to eliminate all heap allocations. Indeed, heap allocation (i.e. garbage creation) is now independent of the message size. We can hash from input type Blob of arbitrary length with a constant heap allocation of 1,008 bytes (for instantiating a class). This change then allowed a further reduction in instructions for Sha256 of 4%.
The new version is 0.0.4. Compared to 0.0.2 we see a total decrease in instructions per byte of 7% for Sha256 and 20% for the empty message.
We have optimized SHA2 further and approximately cut down the instructions in half. This was made possible by utilising the Blob random access that was introduced in moc 0.14.8. Having random access available allowed various improvements that show in particular for large messages.
The new release is 0.1.3 and the combined improvements since 0.1.1, since we started using Blob random access, add up to about 50% reduction in instructions. The heap allocations for sha512 have also been largely eliminated as was already the case for sha256 before. You can see benchmarks here: Mops • Motoko Package Manager
The instructions per byte are now 164 for sha256 and 131 for sha512.
It looks like you get index level access. I don’t wan to be greedy, but don’t think a range syntax and optimization would help as well?
Ie blob[start,end] => Iter or Array
(I’m guessing the second would allocate some memory). This is probably an under the covers of motoko question, but I’m curious what the most efficient loop over blob would be and if a cooler level implementation would be significantly better. I guess the first question we could benchmark.
Yes, I expect further improvements from future extensions in the compiler/language.
To put the current state in perspective, according to canister profiling we are now at 2x the instructions used by the Rust version of sha256. That is 164 instructions per byte for Motoko vs 82 for Rust. I expect that we can close the gap further.
For short messages the Motoko implementation is already faster and that is also why the certified map application performs better in Motoko than in Rust.
I followed up with another release, 0.1.4, which utilised the explodeNat32/64 functions introduced with moc 0.14.9. This speeds up the final sum calculation which results in an improvement in the order of 7% (sha256) and 14% (sha512) for short messages.
Hey I just wanted to try out to use this library in my custom mops package (which has some utilities)… When I tried to publish my updated package it just fails tests
The minimum moc version required for sha2 0.1.3 is moc 0.14.8.
And for sha2 0.1.4 it is moc 0.14.9.
Those minimal requirements are specified in the mops.toml of the sha2 package in the [toolchain] section, for example:
[toolchain]
moc = "0.14.8"
You seem to be running an older version of moc. If you are using dfx then you probably still have an older version because dfx may not have caught up with the latest moc release yet. In that case you have to manually install a newer moc.
You can set minimum required version of moc in [requirements] section, and mops will show a warning if user’s moc version is lower than the required one