I was showing my friend (who has worked in silicon valley and been a computer coder for 30 years for large tech companies) how DW was talking about how LLM’s will be able to run on the IC because of this shift to 64-bit WASM and his response was,
“That doesn’t make any sense. To make LLMs go faster, you need to lower the bitrate. It is called quantization. Sounds like fluff that doesn’t mean anything to me. But what do I know. Moving to 64bit was is trivial and doesn’t really haven anything to do with blockchain. 32bit vs 64bit is just the size of integers that can be held in any specific register and only helps if you are on a 64bit cpu. Not very interesting unless it is 15 years ago.”
Can someone from the DFINITY team explain?
64bit will open more than 4GB of a heap memory.
1 Like
Your friend probably misunderstood what 64-bit Wasm is. That’s not adding 64-bit data (that existed from day 1), but solely 64-bit address space. That means more than 4G of memory, which wasn’t a lot. (There are both technical and historical reasons why Wasm was restricted to that originally, but I won’t bore you with the details.)
Quantization is rather unrelated to that and refers to the size of (float) data. For machine learning, precision is largely irrelevant, so you want to trade it for throughput via smaller data types. With current Wasm, 32-bit floats are the minimum. There is a proposal for adding 16-bit floats, but given that support for those is rare and vastly incomplete and incoherent across current CPUs, they are probably not mature enough to be added to Wasm yet.
3 Likes