Hash Collisions

If I have a Hash (of the Text.hash and/or Principal.hash variety), and I have an account ID variant that has either a #Principal(Principal) or #AccountID(Text), what is the chance of collision?

I know it is a Nat32 which is upwards of 4 Billion possible values, but that doesn’t seem that big when we start talking about web-scale, especially if we start talking about hashing variants that could explode the possible structures.

Now that we have a Crypto Libary, should we replace all the base references to Hash.Hash to a more robust function like SHA224 or 256? (4 bytes vs 24 or 36?). Even an 8 byte hash would drastically reduce the collision chance.

2 Likes

Hey @claudio, @icme, @rossberg, @matthewhammer…maybe we should consider this and refactor some of the initial examples? I’m still using some 32-bit hashes around my code and they make me nervous. They are there because I followed the samples when I was getting started and never refactored.

2 Likes

Oh, now that you mention it, I wonder much of the same myself.

In particular, how to balance the utility of using a high-quality hash versus the cost incurred by doing so, in Motoko.

In particular, I worry about cycle limits, and when they could arise and be surprising. The latest updates to the HashMap in base avoid re-hashing upon growing, so that helps a lot of otherwise problematic, common cases. But it does not solve the upgrade event, when that structure would require complete rehashing, to reform the new object, in the new Wasm memory (recall objects are not stable, as many of us are painfully aware :slight_smile: ).

Perhaps we can do something better there too, by saving hashes in stable memory too, and then the concerns about hash cost are minimized?

For the solution of saving hashes in stable memory it makes even more sense to use a very canonical, well-understood hash function, even if more expensive. (makes sense to me, and feels somewhat unavoidable)

2 Likes

Hello,

I was also looking for an alternative for hash. I opted to implement xxHash.

More information concerning quality of algorithm XXH32 & XXH64

The repo gitlab

1 Like