Losing Precision when Hashing a sha256 Nat

Example: Motoko Playground - DFINITY

I’m taking the sha265 of a string and converting that into a NAT for storage/transport.

When I try to use it in a hash map get an error “losing precision”.

I’m guessing this is from the Hash.hash function which takes a Nat but quickly tries to convert it to a Nat32. I’m guessing that I can Nat.rem my sha256 Nat by 2^32 -1(4_294_967_295) and be ok, but I wanted to check that I’m not doing something wrong here.

Generally, I’m kind of nervous about the HashMaps in Motoko because they seem to be pushing Nat32 as a reliable max number to hash into. This makes sense for one individual hash map with current wasm limits, but if we really want ‘web scale’, collections with over 4.2 billion will be regular(think logs) and collisions over a 4.2 billion space seem likely.

Was there some thinking for not using at least 64 bit Nats which easily puts collisions and saturation pretty far out there(18_446_744_073_709_551_615 ~ 18 quintillions)? 256bits is probably overkill, but it is also pretty compatible with ETH some other crypto concepts.

1 Like

My understanding of Nat32 there and the absence of Nat64 is that the extra bits don’t help given the memory limits of what tables can be created today. However, I take your other points about wanting a long-term hash that is canonical and not tied to any limit of today.

256bits is probably overkill, but it is also pretty compatible with ETH some other crypto concepts.

That’s actually what I would tend toward.

Disclaimer: I’m not an expert in hashing. Far from it. But like everyone, I use it all of the time and care about it.

With that expertise disclosed (or lack thereof), I would say that the stuff in Motoko’s base library now is not sufficient for any serious hashing of the kind you are asking about (long-term representation of the hash, for outside of the hash table and in messages and logs that “live forever”, etc.)

OTOH, I think the sha256 hash as either a byte array or a hex-encoded text string (for human readability in web UIs, for instance) could be a reasonable; I’m not aware of an existing Motoko library that converts between those two representations (like “16cb4410094421fe4418c01598aff8fc5a962692f507173fbfb56a249dd23cff” versus those 256 bits as a 32-byte array) but that is the direction that seems most natural to me, for long-term design.

I’m taking the sha265 of a string and converting that into a NAT for storage/transport.

I’m curious, why do you say that it’s “overkill” if the other concern is scaling to some indeterminate size without an artificial limit (imposed by bad/short hashing with lots of collisions)? Seems reasonable to me, FWIW.

1 Like

Maybe it isn’t overkilled! That is what ETH uses after all. Looks like we need to write these libraries. /rolls_up_sleeves.

The next question would be, does the foundation want these in base? Or do they want external development and packages? This is more of a philosophical question than a technical one.

2 Likes

This is a great question – I would say there’s a natural dependency on Enzo’s sha256 package, which already lives outside of base, so maybe this should as well?

@claudio @rossberg WDYT of this discussion, BTW?

2 Likes