For principals we use a textual encoding that looks like this: em77e-bvlzu-aq
. It is defined in the spec and can encode any blob of data.
There is now the very real possibility that the same encoding gets used for other things than principals. The first such use is going to be for account identifiers from the ICRC-1 standard. But I expect that other uses might soon follow.
Our principal encoding has a popular, similar “sister”: Bitcoin’s bech32. If we look at the use of bech32 then we can see where it is going. Besides Bitcoin addresses, bech32 is used to encode private keys and lightning invoices, and probably other things. If the principal encoding becomes the “bech32 of the IC” then we can expect it to encode principals, account identifiers for various standards (not only ICRC-1), identifiers relevant to certain extensions of ICRC-1, private keys, invoices in payment flows, and other things that we do not foresee.
The question arises whether the encoding should also encode what the encoded blob’s purpose is. By inspecting an encoded string should we be able to tell whether the data is meant to specify a principal, an account identifier, a private key, an invoice, or something else? Or should the encoded data just be a blob and only the context in which it is exchanges defines its use?
If the answer is the former then we have to define how the intended use is encoded. For example, if there is a tag byte in the data then we need to track the used tags in a registry and projects that use the encoding must respect each other’s tags so they don’t overlap.
If we look at precedence from outside the IC then we see that bech32 encodes the type of encoded data in a human-readable prefix. The older Base58 encoding also has tags and version bytes to distinguish Bitcoin addresses and private keys. Ethereum addresses on the other hand are raw hex data without any identifying data pieces.
I am writing this post now because there is a fork in the road ahead. We are approching the fork with the first use of the encoding outside of principals by ICRC-1. The reason that both roads ahead are still open is that principals have a “type byte” which sits in the last byte position of the data and has a value between 01 and 04. So we can decide to
a) define a scheme that, beginning with the last byte, allows one to determine what the meaning of the encoded data is (only the values 01-04 are already taken), or
b) not care and leave the meaning entirely to the context in which the encoding is exchanged.
I think that anyone on the IC, not only people involved in ICRC-1, must be aware of this and share their opinion. In particular, if we go with a) then the scheme must be agreed upon because it only works if all future uses respect the scheme.