Text Compression in Motoko

Text values are internally represented as (ropes of) UTF-8, so mostly 1 byte per character. If the rope consists of a single piece, then the conversion to Blob does nothing at runtime, otherwise it copies and concatenates the pieces.

FWIW, contemporary Unicode has a 21 bit value space. Hence, representations that use 2 bytes combine the disadvantages of a 1-byte representation (no random access) with those of a 4-byte representation (waste of space). In most cases they are only used for legacy reasons, like in old languages and APIs.

2 Likes