Announcing "Token Standard" as topic of the first meeting of the Ledger & Tokenization Working Group

Dear community!

For tomorrow’s (November 29) meeting of the ledger and tokenization WG we would like to propose an agenda with the following topics:

  • ICRC-2 recap
    • Hum on current ICRC-2 proposal
    • Initiating the voting if there is rough consensus
  • WG governance
    • Feedback from the governance WG
    • Next steps

If you have any other suggestions for topics, please let me know.

2 Likes

Dear WG members and interested parties!

This is the “official” channel for the Ledger and Tokenization WG, the other one we had is “closed” to not fragment our online presence too much.

1 Like

Problem with the Zoom link, please use this one one:

Due to a problem with the Zoom link that could not be fixed in time, the meeting of November 29th could not be held. The problem has been addressed now and a new invitation has been created. Please delete all old invitations and use the one in the Working Group Calendar (Google Calendar link).

We propose to move the meeting from today to next week, same day and time, and remain on the current 2-week schedule after that.

Next meeting: December 6, 2022, 18:00-19:00 CET.

Note that the Zoom link has changed.
Please always use the Zoom link from the calendar above!

Sincere apologies for this and also for the multiple calendar emails you have received.

Dear Working Group

tl;dr
Next WG meeting tomorrow, Dec 6, 2022, 18:00-19:00 CET/UTC+1

Because of the hiccup with Zoom last week, we are having the WG meeting planned for last week tomorrow, December 6, 2022, 18:00-19:00 UTC+1. We would like to move forward with the WG agenda and bring some items to a conclusion, therefore we have quite a full program.

Please always use the Zoom link in the shared WG calendar:

Proposed agenda

  • ICRC-2 proposal (ICRC-2)
    • Walkthrough over the proposal
    • Hum on current ICRC-2 proposal
    • Initiating the voting if we have rough consensus
  • Working Group governance (ICRC-0)
    • Feedback from the governance WG
    • Call for feedback
    • Core working group
  • Other
    • Voting participation
    • Working group composition
    • Upcoming items

Slides

ICRC-2 slides

Let us know on the forum or on Discord or send a private note if you would like something else to be also discussed in this meeting.

From the abandoned topic of the working group, by @Embark, reposting here:

Dear working group members!

Here are the agenda and slides for the ledger and tokenization WG meeting tomorrow.

Agenda:

  • Communication & collaboration channels
  • Working group composition
  • Proposed change to voting
  • Working group governance (ICRC-0)
  • Textual Encoding format for ICRC-1 account addresses
  • ICRC-2 proposal (ICRC-2)
  • If time: ICRC-3

Slides:

Please always use the Zoom link in the shared WG calendar:

2 Likes

The WG vote on the ICRC-2 proposal has been opened. Core WG members, please vote on GitHub. The vote is open for one week.

On the textual encoding of subaccounts.

@mariop @dieter.sommer @roman-kashitsyn @jorgenbuilder

Hi all, following the discussions in the working group, I would like to make a case for a single textual encoding that satisfies the following properties:

(1) Uniqueness. No two valid textual encodings should specify the same combination of <principal>:<subaccount>.
(2) Concise. The textual encoding should enable short sub-accounts if possible, e.g. those beginning 0,1 etc.
(3) Checked. Incorrectly copied textual representations, e.g. with a few digits altered from a valid textual encoding should be considered invalid.
(4) Readable. Easy for a person to read (This is primarily (2) but also includes a delimiter to separate principals and subaccounts.

I propose the following format, which I believe satisfies these properties better than the current proposed format.

New format is given by the following function:

CHECKSUM[<Principal>:TRUNCATE(HEX(<sub-account>))]

CHECKSUM is a checksum function that capitalises based on the hash of the input. As is done with Ethereum addresses we capitalise symbols of the output based on the hash of the input string of the non-capitalised/ non-checked textual representation. See how this is implemented in Ethereum here here. What’s nice about this is that non-checked textual representations simply don’t include the capitalisation, and so we can even have checked Principals as well as checked Principal: subaccounts, unlike in the previously proposed format.

HEX(<sub-account>) is the hexadecimal representation of the sub-account byte array.

The truncate function removes all redundant zeros. For instance the Byte array in hex representation 000...001 is represented as the string 1 to satisfy the uniqueness property.

As an example, suppose the following is a valid textual encoding, for purposes of comparison we will show some invalid representations made by making slight alterations.

4Kydj-ryaAa-aaAaG-qaf7A-cai:1

Now, to give an example of what would be invalid representations - suppose we changed the subaccount to 2, then 4Kydj-ryaAa-aaAaG-qaf7A-cai:2 is invalid because it violates the checksum, the checksum would actually be some other capitalisation, say: 4Kydj-rYAaa-aaAag-Qaf7A-cai:1. Now suppose we also added an extra 0 to the subaccount: 4Kydj-rYAaa-aaAag-Qaf7A-cai:01. This is invalid because it contains an extra 0 before the 1 and would break the uniqueness property.

Note also 4Kydj-ryaAa-aaAaG-qaf7A-cai:0 is invalid as this can be represented as 4Kydj-ryaAa-aaAaG-qaf7A-cai

I think this representation would give the nicest most readable principal:account-identifiers, whilst also enabling check-sums for both the <principal-id>:<subaccount> and vanilla <principal-id> formats.

Note compared to the old representation, we don’t need to know the length of the principal if we use a delimiter to separate the principal and subaccount. This is a better option because it results in a format that is easier to read by humans.

4 Likes

@Maxfinity, thank you for continuing the discussion from yesterday’s meeting. The above might also be interesting to @timo , @jorgenbuilder, @cryptoschindler, @hpeebles, @witter, @jzxchiang, @avi, and @mparikh.
The current standard proposal is here (slides 5 and 6).
Let’s try to conclude on this before the holiday break!

1 Like

So subaccount 0 would look like this?

4Kydj-ryaAa-aaAaG-qaf7A-cai:

To clarify. Property (4) means “parseable by the human eye”, right? So a human can see what the principal is.

Some considerations on the checksum:

  • This type of checksum makes it easier for receivers to skip the validation. We have seen this on Ethereum. People were thinking their exchange would check the checksum on their withdrawal address but they didn’t and funds got lost. The problem is that both the capitalized and non-capitalized versions are valid at least somewhere. It may not be as bad for us as it is for Ethereum because the principal is checksummed in itself, so the worst that can go wrong is the subaccount.
  • On Ethereum it wasn’t planned, they simply forgot to introduce a checksum and came up with the capitalization later as a hack.
  • base32 was designed for all the same case to avoid similar looking characters. For example, I and L are allowed and both the lower-case combination il and the upper-case combination IL are clear. But a mixed-case Il isn’t. (Those characters are Il.) If we expect nobody to type then maybe not an issue.
  • Principals were designed to be easy to visually compare for equality. Partial capitalization may lose this property. For example compare these two:
    4Kydj-ryaAa-aaAaG-qaf7A-cai
    4kydj-ryaaa-aaaag-qaf7a-cai
    vs these two:
    4kydj-ryaaa-aaaag-qaf7a-cai
    4kydj-ryaaa-aaaag-qaf7a-cai
    So the question is if this proposal even gives us property (4) or not.

I have argued before that a checksum for subaccount isn’t needed because we can recover from failures. If we believe it is needed and we want property 4 as well, then can’t we leave the principal untouched and do an extra checksum somewhere related to the subaccount (for the price of making it longer)?

1 Like

*Good point, capitalisation of a specific letter say I or L could be avoided by the algorithm.

*Personally, because the receiving address is checked, it doesn’t actually need to look exactly the same for you to have a high degree of confidence that both receiving addresses are correct. Say you had derived:

4Kydj-ryaAa-aaAaG-qaf7A-cai:1
from
4kydj-ryaaa-aaaag-qaf7a-cai

Well, if you had made a mistake in copying any part of the principal, then the check would fail. So this is still kind of safe, and the fact the strings look vaguely similar should give a user enough confidence when using their apps.

On your other point about whether checked sub-accounts should be optional: I would argue that checked sub-accounts would be a necessity. You could use a specific sub-account for receiving payments with each sub-account dedicated to a different user - in this case and many others, payments to the wrong sub-account are not recoverable.

So the question should be where to put the check for the sub-account. Suppose we put it at the end, this would also be ok. However, I think you lose the property that a user can create their own subaccount address string by simply appending a 1, as the user now has to use some encoding to generate a check. You could also make the checksum optional, and I think that would also be ok.

See below for an example:

4kydj-ryaaa-aaaag-qaf7a-cai:1
4kydj-ryaaa-aaaag-qaf7a-cai:1:aa-bc

I am sure it’s safe enough but I am worried about the user experience. Say a user gets shown deposit address in a pop up window (4Kydj-ryaAa-aaAaG-qaf7A-cai:1). Now we are asking the user to compare that to the canister id that he sees somewhere else (e.g. in his address book where it is all lower caps) to make sure he pays the right canister. Even if it is safe, it’s still too confusing for the user.

But you lose that with the capitalization, too. The user has to run a tool to compute the capitalization.

The capitalization checksum idea is interesting. But since the principal is already checksummed, and we’d like the user to be able to easily compare visually, why don’t we just cap checksum the subaccount part?

Then we would have:
4kydj-ryaaa-aaaag-qaf7a-cai:1
4kydj-ryaaa-aaaag-qaf7a-cai:3fCe35D21Aa8

which is 1)unambiguous in which principal you’re transferring to, 2)easily human constructible in simple cases and 3)checksummed when machine generated long subaccounts are used.

And for this to be canonical 4kydj-ryaaa-aaaag-qaf7a-cai:0 is always written as such.

Hi @Maxfinity, the current textual account representation does have a checksum(principal + subaccount)

@timo, @benji I would be happy with this suggestion, just capitalise the subaccount. Most receiving addresses in code would be machine generated, and these short sub-accounts could be used for one-off payments between users (can be manually checked if things go wrong).

Also like @timo’s original suggestion but with an extra check bit for security.

Of course, If we primarily use the approve, transfer_from payment flow, rather than paying to receiving addresses that need subaccounts, it becomes less of a necessity to check the subaccounts. Well subaccounts are much less needed in general here, but the former payment flow is much more common on the IC so…

@Maxfinity, thank you for your last-minute change proposal w.r.t. the textual representation. @roman-kashitsyn, @benji, and I just had a good discussion on the proposal and what has been discussed in the forum by @Maxfinity, @timo and others.

Here’s a summary of our findings.

The properties we want to achieve

  • A textual encoding of any non-reserved principal is a valid textual encoding of the default account of that principal on the ledger.
  • The decoding function is injective (i.e., different valid encodings correspond to different accounts). This property enables applications to use text representation as a key, for example in a map.
  • Protection against copy-paste errors or typos
  • Human readability (particularly the ability to identify the subaccount with the naked eye)

Approach

We concluded that the most suitable representation meeting those properties is the one presented in the following examples (note that the principal contains a checksum over itself):

  • 4kydj-ryaaa-aaaag-qaf7a-cai (default subaccount = principal)
  • 4kydj-ryaaa-aaaag-qaf7a-cai:1 (simple subaccount, no checksum on subaccount)
  • 4kydj-ryaaa-aaaag-qaf7a-cai:3fCe35D21Aa8 (complex subaccount, contains checksum over the whole 2-tuple through the case of letters in the hexadecimal representation of the subaccount)

Informal specification

Let f be the textual encoding function specified as follows, where || is string concatenation. Let principal be a principal in textual representation and subaccount a subaccount in byte array representation.

f(principal, subaccount) := principal || “:” chk(principal || “:” || hex(subaccount), subaccount)

chk(a, b) is a checksum function that capitalises b based on the SHA-256 hash of a. The input string a is a canonicalized hexadecimal string, i.e., comprising only the characters [0…9, a…f]. The hexadecimal representation a is hashed with SHA-256 to obtain h, and for each digit with index i in a, print it in uppercase in the result, if the 4*i-th bit of the hash is 1, in lowercase otherwise. Digits are taken over from a to the result. I.e., we capitalise letter symbols of b in the output based on the hash h of a. This is analogous to Ethereum address checksums.

In words

  • The encoding is created from the principal, followed by a colon, followed by the subaccount
  • The subaccount is defined as follows:
    • Take the subaccount in hexadecimal representation with leading zeroes stripped
    • Compute a checksum over the principal and subaccount and represent it through the case of the letters in the hexadecimal representation of the subaccount (the principal remains untouched)

Question / discussions

  • Do we need to include the “:” in the input to the checksumming, as a “domain separator”? Does not harm, but not clear whether it is really needed. If not, it should be removed.
  • Why does Ethereum EIP-55 use the 4i* and not just i? For a good hash function, this should not make any difference.
  • Should we rather do the hashing on the byte-array representations? Might be cleaner, it would be an easy change that we need to discuss.

What does this achieve?

  • This checksum is part of the resulting subaccount only if the hexadecimal representation of the subaccount contains letters. I.e., for “simple” subaccounts like 1, 2, etc. there is no checksum available for this reason as digits don’t have a case. For subaccounts derived through a hash function, e.g., SHA-224 or SHA-256, we have an expected ~21 or ~24 bits of checksum, respectively, expressed through casing of the letters, which catches copy-paste errors with high probability. Simple accounts like “1”, “1234” etc. do not have a checksum.
  • We leave the principal untouched, so it can be easily compared via eyeballing.
  • We have a checksum over everything if the subaccount is not a “simple” subaccount. Having checksums over complex, long subaccounts addresses requirements addressed in the forum threat. Not having checksums over short, simple subaccounts seems OK in the light of the discussions.
  • Having upper/lowercase in the subaccount has not been seen as an issue so far (but also not explicitly addressed). @timo?
  • Users can still create simple subaccounts on their own as they are not checksummed.

We think that this approach is the best compromise we can make given the discussion we have had so far in the forum. It has checksums where helpful, but skips them where we think that they are less required. We think this is a clear improvement over the previous proposal.

Please let us know what you think about going forward with this proposal. If we do not hear objections, someone needs to spec it properly and then we can open a vote on it. At least it seems that what we have now is strictly better than what we had before.

3 Likes

Is the checksum optional? That is, can a user opt-out and just write everything lower case if he wishes and forego the benefits of the checksum?

Do we need the checksum to include the principal? Isn’t it sufficient if it is computed from the subaccount alone?

The principal’s internal checksum is based on CRC32. Instead of introducing a new function, sha256, would it make sense to use crc32 again to reduce code dependencies and maybe it is also faster?

We have no checksum for small subaccount ids and >20bit for large ones. But what about the middle size? I am sure there are application where subaccount ids are generated sequentially and handed out.

For a 4 byte size (8 characters) we have on average 3 bits of checksum. For a 8 byte size (16 characters) we have on average 6 bits of checksum. Compare that to an IBAN which has ~6.5 bits.

If we consider that number of bits fine then I think we are better off adding a fixed length of 1 byte (2 characters) to the subaccount id, i.e. instead of

4kydj-ryaaa-aaaag-qaf7a-cai:3fCe35D21Aa8

write one of these (if the checksum is 1b):

4kydj-ryaaa-aaaag-qaf7a-cai:3fce35d21aa8.1b
4kydj-ryaaa-aaaag-qaf7a-cai:3fce35d21aa81b
4kydj-ryaaa-aaaag-qaf7a-cai:3fce35d21aa8:1b
4kydj-ryaaa-aaaag-qaf7a-cai:3fce35d21aa8-1b
4kydj-ryaaa-aaaag-qaf7a-cai:1b.3fce35d21aa8
4kydj-ryaaa-aaaag-qaf7a-cai:1b3fce35d21aa8
4kydj-ryaaa-aaaag-qaf7a-cai:1b:3fce35d21aa8
4kydj-ryaaa-aaaag-qaf7a-cai:1b-3fce35d21aa8

It avoids confusion for users who aren’t used to the capitalization. Moreover, with a 4 byte subaccount 1 out of 4 subaccount ids will be all lower case or all upper case. Then the user will wonder if what he is looking at is checksummed or not. These things are avoided by adding dedicated checksum characters.

1 Like