Annoucement: HPL - a ledger for 10k tps

Today we are announcing the HPL project (high performance ledger). The goal is to implement a ledger on the IC that can receive and process 10,000 transactions per second which are initiated by different users and submitted in individual ingress messages.

To achieve this we have developed a multi-canister architecture for the ledger which operates multiple aggregators deployed on different subnets and a single ledger on another subnet. With every heartbeat, the aggregators forward transaction requests in batches to the ledger where they get processed.

We publish here GitHub - research-ag/hpl-io: The high-performance ledger :

  • did file of the aggregator with comments
  • did file of the ledger with comments
  • README: a high-level overview of the goals and characteristics of the ledger
  • README: description of how external clients (frontends, wallets) submit transactions and track progress by querying both aggregator and ledger

We would be grateful for review, comments and questions.

Please note that comments in the .did files and the README complement each other. We currently do not have a single document explaining everything.

The ledger and aggregators have been implemented and are deployed on 12 subnets, one for the ledger and 11 for aggregators.

To play around with the live deployment we have a demo frontend which is rudimentary but useful and much better than the raw candid UI. It is deployed here: https://debug.hpl.live/. Please see the comments in the .did files for an explanation of the functions that are exposed in this demo.

A first experience shows that the latency from clicking “Send” on a transfer to the final confirmation of processing in the ledger is around 6-8 seconds most of the time. Sometimes it can be less, sometimes more. The latency is due to the extra inter-canister and, cross-subnet hop that is required.

We collect extensive metrics and provide a Grafana dashboard with various panels showing:

  • Ledger:
    • transaction rate (tps)
    • batch rate
    • number of registered accounts, owners, assets
    • number of processed transactions
    • heap memory
  • Aggregators:
    • high watermark of queue size
    • number of concurrent batches in flight
    • batch return time

The dashboard can be seen here: http://dashboard.hpl.live/

For example, it can be seen from the dashboard that most of the time there are 5 batches concurrently in flight from the same aggregator. But sometimes this number can peak to several times more.

It also shows that most of the time the response to a batch comes back after 6-7 seconds. But this number can also spike to several times more.

There are various other existing pieces of this project that are not yet ready to be published (but publication will follow):

  • Motoko implementation of the aggregator
  • Motoko implementation of the ledger
  • client library (typescript) for transaction submission
  • demo frontend source code

And of course there are many, many more pieces left for which work is either in progress or hasn’t started yet.

Have fun with it!

And again, we would be grateful for review and comments on the published interface.

48 Likes

Batching multiple requests into a single request is a rather cool approach, I assume to get this throughput you’d still need to deploy on multiple subnets to not run into the message throughput limit of a specific subnet.

Edit: I see now in the second paragraph that it’s indeed multi subnet.

Great job! I would love to dive into the implementation once it’s open source.

Please fix the link - it has an extra colon at the end.

2 Likes

That’s a really cool project.
Is this ledger intended for any particular token, or is it just a general library for now?

Also, how does this effort relate to standardization efforts (ICRC-1/ICRC-2)?

1 Like

Yes, ingress messages per subnet is the bottleneck. With the 11 deployed aggregators I suspect we can get to 5k tps. We can try that out in the next couple of days.

1 Like

It’s a multi token ledger. Anyone can create a new token on it with a canister call.

It cannot be ICRC-1 compatible for various reasons (explained in the README).

It has a different, more general concept of „virtual accounts“ to achieve the functionality of ICRC-2 and more (also see README).

3 Likes

What was the reason behind virtual accounts? You wanted to make it more restrictive or it has a scalability purpose. I guess if A agrees to accept from B the virtual account is in B’s canister making the transaction happen without cross-canister calls.
The restriction may be good for NFTs where you don’t want to receive NFT ads in your wallet/gallery.

Hi @timo,

This would be a great topic for the Scalability & Performance WG. The next session will be Thursday, November 16th, 5:30 pm CET. Would you be able to present there?

(cc @abk )

6 Likes

No, the restriction of not allowing unsolicited deposits is not for scalability reasons. It is just there because I think allowing such deposits is bad because it opens up attacks (spam, tainted funds, unwanted airdrops and taxes, transaction history poisoning, etc.).

4 Likes

Most interesting DeFi development on the IC since ICRC1!

2 Likes

Why is there 10k tps (20 aggregator subnet) limit? Wouldn’t you be able to deploy aggregators on an unlimited amount of subnets? Is it because of the tps limit of the single subnet holding the ledger? If so would you not be able to shard the ledger on different subnets to scale further?

1 Like

The 20 subnets (10000 tps) is a current assumption, and more will probably be possible in the future. My understanding is that since Ledger is single, the number of processes is bounded by 970/s per one subnet, so the number of processes should vary greatly depending on how much batch processing can be done.
I assume that there will be delays in finality, etc. when HPL exceeds 10000 tps, but what are the tradeoffs in seeking scale?

2 Likes

If we want to scale this past 10k tps I guess the finality delays would be inevitable. But once you shard and load balance the ledger the tps should be unbounded then right?

1 Like

The 10k was a goal not a limit. And as a goal it is somewhat arbitrary. It represents a certain order of magnitude that both non-trivial to reach and has practical relevance because it is said that VISAs global transaction throughput is in that order (smaller on average, higher on peak).

To reach the goal 20 subnets with 500 tps ingress throughput each are needed. Currently I cannot even deploy on 20 subnets because when you create a new canister on a random subnet then you end up on one out of only 11 subnets. If someone gives me canister ids on other subnets then I can deploy aggregators there as well.

The next limit after horizontally scaling the number of aggregators to 20 and beyond is the 2MB block size limit of the subnet that hosts the single ledger. If a transaction occupies 100 bytes that would create a limit of 20k tps. But we can certainly introduce some compression techniques by shortening reused principals to increase that number.

The next limit after that would be processing time in the single ledger. The code uses around 40k wasm instructions per transaction processed. At an execution round limit of 5 billion instructions that means 125k transactions per round can be processed. But this number has to be taken with a grain of salt. Not all wasm instructions are equal in terms of real-world time. Pushing to actually using 5B instructions per round could significantly increase the round time (from the ~1s it is at lower loads). We haven’t done any experiments measuring real-world time. Moreover, the 40k instructions per transaction were measured with relatively empty data structures. That may increase as well.

In summary, what these calculations show is that after applying some margins, doing low hanging fruit optimizations, and horizontally scaling the aggregators, 20k tps should be possible. All blockchains combined don’t have that kind of demand. And it’s close to VISAs global peak time demand. Therefore I don’t see a need to shard the ledger. But as you say, sharing the ledger would also be possible. I just don’t see the need at the moment that would justify the added complexity.

What needs to be sharded though is the transaction history archive and indexing canisters. We don’t currently have an archive canister. 10k tps is in the order of 1b per day. The storage would fill up very quickly. If anyone is interested in working on the archive then please let me know! Rust or Motoko, the archive can be in any language.

7 Likes

I like how radically different this is and how it raises interesting questions. Perhaps it will be the next-gen ledger. Anvil is similarly multi-canister multi-token (both NFT and FT) and allows users to create tokens (not in the dapp publicly). Similarities end there. I think the added complexity to clients both on and off-chain hindered adoption and also made me want to change the architecture to something different that didn’t require complicated client libraries. Being able to call r2pvs-tyaaa-aaaar-ajcwq-cai.myFunc(...) is just too good and simple to move away from.

A btw question: Is it possible to make HPL DoS-resistant? The aggregators accept transactions and store them, but they don’t know if these transactions come from accounts with any tokens in them, so a fee can’t be taken until the ledger processes them? This will allow someone to flood the aggregators with fake transactions HPL has to process for free?

I wonder what are the benefits of HPL versus this:
We have the same icrc1 compatible ledgers, but imagine there are 20 ledger canisters in different subnets. Each one has its own separate memory and account->balances. When you transfer from one account in ledger A to another account in ledger A - the transaction gets executed without cross-canister calls. When you transfer from account in ledger A to account in ledger B the ledgers securely communicate to execute the transaction.
Each account resolves to a specific ledger - This could be done with a hashmap of the account id, so without any additional queries a function can calculate which ledger is supposed to host a particular account.
The throughput should be ~20x bigger than what one ledger can do. Similar to HPL?
But how do we make this work without requiring additional client libraries which first fetch a routing table, and hashing account IDs to finally connect to the target canister?

I think the best solution would be to have ‘routers’ which can take the position of another canister, but are not an ordinary canister and are not inside any subnet (or perhaps are in all of them?). They could be a set of functions with the same interface inputs as the target function. They can use the inputs for calculations and then return a canister id. The system then forwards the call to that canister id. Not sure where the router’s place is and how it will become secure, but if it’s possible to find it a place it will solve a lot of scalability problems without changing everything.
There will be no changes needed in any canister like dexes or frontends/wallets. We can still use icrc1 and anyone can scale from one canister ledger once it’s not enough - to a multi-subnet ledger by replacing the canister with a router and installing more ledgers. Those ledgers will also be the ledgers we use right now, except they have to add a few lines that handle transfers to accounts hosted in other ledgers.
This will also allow other types of canisters to scale, other services, databases, or asset canisters.

3 Likes

The relevance for DeFi could be that this is a multi-token ledger.

The two things go hand in hand: with high throughput you make room for multiple tokens, and conversely, if you have a lot of room then how do you fill it other than with multiple tokens?

The benefits of a multi-token ledger that I see for DeFi are:

  • atomic swaps between multiple tokens (especially relevant on the asynchronous environment that is the IC)
  • unified integrations for wallets, hardware wallets, DEXs, CEXs, etc.
  • reduced friction for creating new tokens
  • reduced friction for wrapping and bridging (inside IC or cross-chain)
4 Likes

What do you mean by processes and where does the number 970/s come from?

Yes, that’s an interesting question. My plan was to create a “credit system”. It means that principals hold credits in the ledger and those credits are communicated from the ledger to the aggregators. So the aggregators mirror the credit balance, but with a delay of course. Say I have 1000 credits and there are 10 aggregators. Then each aggregator will allow me to submit 100 concurrent transaction requests. So the space I can occupy in the queue is limited. It is like a quota that is proportional to my credit. When the aggregator knows that a transaction has been processed by the ledger it clears it from the queue, freeing the occupying space. That is happening now. With the credit system it would then also free my quota and allow me to submit more concurrent transaction requests.

There are various variations of this. Credit can refer to an absolute number of transactions, i.e. credit counts down and has to be replenished, or it can refer to a rate, i.e. it doesn’t count down and just defines the quota that can be occupied at any point in time.

There are also variations on how credit can be obtained. Either it can be bought outright or it can be earned over time by past successful transactions that lead to a fee payment in the ledger.

There can be a certain quota for “uncredited” transactions that can be used by new users who do not have any credit yet or don’t want to bother with it. Then the most a DoS attacker can do is exhaust that quota, i.e. interrupt the service for users who don’t have credit, but not for the other ones.

2 Likes

Here is the answer to my past question. My poor understanding is that there is a consensus limit to the tx process.
The finality time of 8 seconds seems a bit long. Is there any advantage over this?

Really interesting post. I’ll definitely be following along.

Do you have any thoughts on history (ICRC-3) type functions and how these would work. For example, from the perspective of the 221Bravo dev, how can tokens histories on HPL be searched and how can we automate the process of ‘finding’ new tokens that are added.

Ideally anyone tracking the ledger shouldn’t have to change structs or enums to pickup new variants of tokens.

Side thought - I’m not completely sold on the idea of a variable fee depending on transaction size. This seems like it’s getting into tokenomics a bit. From a technological standpoint there isn’t any cost difference between processing a 1million tx and a 0.0001 tx - why should the 1m pay more?