Announcing "Token Standard" as topic of the first meeting of the Ledger & Tokenization Working Group

On the textual encoding of subaccounts.

@mariop @dieter.sommer @roman-kashitsyn @jorgenbuilder

Hi all, following the discussions in the working group, I would like to make a case for a single textual encoding that satisfies the following properties:

(1) Uniqueness. No two valid textual encodings should specify the same combination of <principal>:<subaccount>.
(2) Concise. The textual encoding should enable short sub-accounts if possible, e.g. those beginning 0,1 etc.
(3) Checked. Incorrectly copied textual representations, e.g. with a few digits altered from a valid textual encoding should be considered invalid.
(4) Readable. Easy for a person to read (This is primarily (2) but also includes a delimiter to separate principals and subaccounts.

I propose the following format, which I believe satisfies these properties better than the current proposed format.

New format is given by the following function:

CHECKSUM[<Principal>:TRUNCATE(HEX(<sub-account>))]

CHECKSUM is a checksum function that capitalises based on the hash of the input. As is done with Ethereum addresses we capitalise symbols of the output based on the hash of the input string of the non-capitalised/ non-checked textual representation. See how this is implemented in Ethereum here here. What’s nice about this is that non-checked textual representations simply don’t include the capitalisation, and so we can even have checked Principals as well as checked Principal: subaccounts, unlike in the previously proposed format.

HEX(<sub-account>) is the hexadecimal representation of the sub-account byte array.

The truncate function removes all redundant zeros. For instance the Byte array in hex representation 000...001 is represented as the string 1 to satisfy the uniqueness property.

As an example, suppose the following is a valid textual encoding, for purposes of comparison we will show some invalid representations made by making slight alterations.

4Kydj-ryaAa-aaAaG-qaf7A-cai:1

Now, to give an example of what would be invalid representations - suppose we changed the subaccount to 2, then 4Kydj-ryaAa-aaAaG-qaf7A-cai:2 is invalid because it violates the checksum, the checksum would actually be some other capitalisation, say: 4Kydj-rYAaa-aaAag-Qaf7A-cai:1. Now suppose we also added an extra 0 to the subaccount: 4Kydj-rYAaa-aaAag-Qaf7A-cai:01. This is invalid because it contains an extra 0 before the 1 and would break the uniqueness property.

Note also 4Kydj-ryaAa-aaAaG-qaf7A-cai:0 is invalid as this can be represented as 4Kydj-ryaAa-aaAaG-qaf7A-cai

I think this representation would give the nicest most readable principal:account-identifiers, whilst also enabling check-sums for both the <principal-id>:<subaccount> and vanilla <principal-id> formats.

Note compared to the old representation, we don’t need to know the length of the principal if we use a delimiter to separate the principal and subaccount. This is a better option because it results in a format that is easier to read by humans.

4 Likes