Hash Collisions

skilesare · March 26, 2022, 2:01am

If I have a Hash (of the Text.hash and/or Principal.hash variety), and I have an account ID variant that has either a #Principal(Principal) or #AccountID(Text), what is the chance of collision?

I know it is a Nat32 which is upwards of 4 Billion possible values, but that doesn’t seem that big when we start talking about web-scale, especially if we start talking about hashing variants that could explode the possible structures.

Now that we have a Crypto Libary, should we replace all the base references to Hash.Hash to a more robust function like SHA224 or 256? (4 bytes vs 24 or 36?). Even an 8 byte hash would drastically reduce the collision chance.

skilesare · July 11, 2022, 12:36pm

Hey @claudio, @icme, @rossberg, @matthewhammer…maybe we should consider this and refactor some of the initial examples? I’m still using some 32-bit hashes around my code and they make me nervous. They are there because I followed the samples when I was getting started and never refactored.

matthewhammer · July 11, 2022, 5:13pm

Oh, now that you mention it, I wonder much of the same myself.

In particular, how to balance the utility of using a high-quality hash versus the cost incurred by doing so, in Motoko.

In particular, I worry about cycle limits, and when they could arise and be surprising. The latest updates to the HashMap in base avoid re-hashing upon growing, so that helps a lot of otherwise problematic, common cases. But it does not solve the upgrade event, when that structure would require complete rehashing, to reform the new object, in the new Wasm memory (recall objects are not stable, as many of us are painfully aware ).

Perhaps we can do something better there too, by saving hashes in stable memory too, and then the concerns about hash cost are minimized?

For the solution of saving hashes in stable memory it makes even more sense to use a very canonical, well-understood hash function, even if more expensive. (makes sense to me, and feels somewhat unavoidable)

rbolog · July 30, 2022, 7:24am

Hello,

I was also looking for an alternative for hash. I opted to implement xxHash.

More information concerning quality of algorithm XXH32 & XXH64

The repo gitlab

Topic		Replies	Views
Losing Precision when Hashing a sha256 Nat Language Support Motoko	3	824	August 4, 2021
Field hash is deprecated Language Support Motoko	6	1235	March 2, 2023
How to compute the hash of a wasm? Language Support Motoko	2	913	May 30, 2022
Base64 decode or any other one-to-one hash creator from a seed in motoko Developers	3	827	August 9, 2021
Progressive SHA256? Language Support	11	699	February 6, 2023

Hash Collisions

Related topics