Text Compression in Motoko

rossberg · January 17, 2022, 11:04am

Text values are internally represented as (ropes of) UTF-8, so mostly 1 byte per character. If the rope consists of a single piece, then the conversion to Blob does nothing at runtime, otherwise it copies and concatenates the pieces.

FWIW, contemporary Unicode has a 21 bit value space. Hence, representations that use 2 bytes combine the disadvantages of a 1-byte representation (no random access) with those of a 4-byte representation (waste of space). In most cases they are only used for legacy reasons, like in old languages and APIs.

Topic		Replies	Views
Motoko Binary Concatenation Language Support Motoko	1	574	October 4, 2021
What the max size of "Text" in motoko Developers	3	678	June 16, 2021
New Vector data structure in Motoko Language Support Motoko	9	698	July 31, 2024
Motoko Text creation performance Language Support Motoko	0	350	February 20, 2023
Explode `WordN` into byte array Language Support Motoko	7	1014	September 25, 2020

Text Compression in Motoko

Related topics