Announcement: HPL - a ledger for 10k tps

timo · November 1, 2023, 10:28am

Hi Rosti. Thanks for offering to increase the limit. But I am not currently hitting the rate limit. Each boundary node allows 300 ingress messages per target subnet. My load script distributes the load equally over 10 boundary nodes. That is way below the limit.

Say there are N target subnets (= number of aggregators) and M boundary nodes and I send a total of R tps. Then R/(NM) must be < 300. In my case I have R = 5,000, N = 11, M = 10, so R/(NM) = 45. So I am still very far away from the rate limit.

Yes, if the rate limit was increased then I could simplify my load script and I could send the whole load to one boundary node instead of 10. But it is actually a more realistic simulation of real-world usage if the traffic goes through different boundary nodes. Also, I am not sure if that would help me. It is hard in practice from a networking perspective to get that many requests per second to a single server reliably and at a constant rate. I am afraid that it would result in higher fluctuations in my tps rates. Going through 10 boundary nodes evens out the natural fluctuations in the speed and quality of the network connection.

timo · November 1, 2023, 10:41am

This statement refers to a consensus bottleneck which still exists regardless of what the boundary nodes do. This limit comes from how many individual messages can be gossiped per second between the nodes of a subnet. Because they are individual messages there is some limit regardless of how small they are. I believe this limit is < 1,000 but I have not actually tried to push it that far. The most I tried was 450.

rumenov · November 1, 2023, 12:13pm

Looking at boundary node metrics there was an increase of 429 responses exactly when you were running your load. That’s why I thought you are hitting the boundary node limits. Looking more carefully I do see some 429 responses coming from replicas. More specifically I see ~200k 429 responses starting 10pm on 27th until 6am on 28th.
There is no explicit limit on the ingress messages per second we gossip. So the number of messages really depend on bandwidth, latency, finalization rate and if the code actually pipelines/multiplexes messages well.
Most likely the replicas were not gossiping fast enough hence the ingress artifact pools got full, hence the 429 but some metrics are inconclusive.

timo · November 1, 2023, 12:37pm

But you have concluded that the 429 are not coming from the boundary nodes? So we are certain that it wasn’t a rate limit? Or is that not clear yet?

How large are the ingress artifact pools?

The number of ~200k was for one boundary node that you looked at (if yes which one?) or for all combined?

timo · November 1, 2023, 12:44pm

I will take that statement back. Block making should be independent from gossiping. So even if gossiping completely fails, due to being overloaded or anything else, block making should still work. It just means that ordering isn’t fair anymore and the variance in latency goes up significantly. But throughput shouldn’t be affected by gossiping. Throughput should indeed only depend on message size. So maybe it is possible to get >1000 ingress messages per second into a subnet, just not in a very fair and responsive way.

memetics · November 1, 2023, 1:09pm

When funded I would like to be an early adopter if the allocation is healthy

rumenov · November 1, 2023, 3:52pm

It is the replicas only. In the morning we looked at aggregated logs from the boundary nodes that didn’t make the distinction.

10k number of ingress artifacts in total for both the validated and unvalidated pools.

Those were coming from subnet o3ow2 (combined across all replicas in that subnet).

hokosugi · November 1, 2023, 9:57pm

Thank you for your response regarding latency.
I would expect enterprise scale transaction processing to be possible if the ledger is scaleable. I don’t see any issues compared to the Solana and Layer2 solutions that Visa is testing. What are the advantages over these?

What is the use case for HPL? Are you thinking of adapting it to the enterprise or building a huge ledger system including ckBTC, ckETH, and other ICRC wrapped in HPL?
Also, is it a technical limitation that you can’t transfer money to/from other accounts only through virtual accounts? Or is it to solve the disadvantage of not being able to create a common account by creating a separate Principal for Internet Identity authentication?

Sal_Paradise · November 2, 2023, 10:08am

What? 600k TPS?

timo · November 2, 2023, 11:18am

That’s only “validation” which means it is inside the execution layer. It doesn’t include any networking / consensus. They made a new C++ implementation of the validator and to show how optimised it is they compare the pure validation (aka execution) speed.

But the real bottle neck is always networking which makes the cited number of 600k irrelevant for throughput. It is relevant if validators want to catch up and re-validate the entire history of the chain, which is probably why they optimized it.

For networking / consensus you will see a wide range of throughput numbers thrown around but they are not always comparable. A big differentiator is fairness. For example, if I just go round-robin through a set of block makers in a pre-determined way then the user can directly submit to the next upcoming block maker (or the next N blockmakers as a back-up). I don’t need any gossiping of individual transactions between block makers. They can focus on sending entire block proposals, which are essentially large batches of user transactions, instead of individual transactions. But this is susceptible to front-running. If instead you gossip individual transactions between all blockmakers and only determine the next block maker a fraction of a second before the block is made then you have a much harder problem, but are less susceptible to front-running.

Gossiping is expensive and slow and the cost increases significantly with the number of messages. It is a huge difference if I gossip 10k individual messages immediately as they come in, from the one node who received the message from the user to all other nodes, or if I collect them all in one place (a sequencer) and then send all 10k in one batch to other validators. The two problems don’t even compare. Throughput with batching is practically unlimited when compared to a protocol based on gossiping individual messages. For the latter 1,000 tps is already an achievement.

That being said, applications are also not always comparable. For a ledger that we are building here fairness of ordering is probably not so important. Because we don’t usually double spend our own transactions. Nor would anyone be interested in front-running a payment that I make. But for a DEX it is very important.

timo · November 2, 2023, 11:46am

What is the question exactly, you mean what are the advantages of HPL over what Visa is doing? Do you have a link to Visa’s projects?

The latter. One place where all other tokens can be wrapped plus native ones can live and where they can be atomically swapped. And one place, when integrated into a wallet, hardware wallet, or exchange or other service, then all tokens are immediately integrated as well.

It is not a technical limitation, it is a deliberate one. And it has nothing to do with II. I have plans for direct transfers as well without virtual accounts.

JaMarco · November 2, 2023, 12:26pm

This twitter thread I think explains https://twitter.com/cuysheffield/status/1699031109080945049

Sal_Paradise · November 2, 2023, 1:22pm

im still quite confused about VISA. what are they seeing in Solana that we dont?

I remember reading @victorshoup’s thesis on Solana’s consensus mechanism. proof of history. its fundamentally flawed, apparently. Was he wrong?

Is this the issue?

Sal_Paradise · November 2, 2023, 4:10pm

JaMarco · November 3, 2023, 12:10am

One big thing is Solana has USDC, which is what Visa is using to settle transactions in.

timo · November 3, 2023, 9:18am

We have improved the dashboard on http://dashboard.hpl.live/. It now allows for a detailed analysis of latency which looks like this:

What is measured here is end-to-end time for making a transfer, i.e. the time from initiating the transfer to receipt of the result, both measured in the frontend. The highest level of detail is given by the heatmap in which even the outliers can be seen. We also show the average and various percentiles (50%, 75%, 90%, 95%). All three together should give quite a complete picture.

We have selected 5 aggregators that we will continue to work with. The dashboard shows a combined number for all of them, i.e. what we see is essentially an average over all aggregators. But we also have a couple of graphs for each individual aggregator including average latency and heatmap (not percentiles). Average latency should be enough to compare the aggregators against each other and the individual heatmaps allow us to see any concentration of outliers which usually point to a communication problem of the subnet hosting the aggregator.

Finally, for comparison, we also measure “ping time” which is a direct update call to each of the canisters (aggregators and ledger). This is the latency that we would experience if the HPL did not use xnet hops, e.g. if it was a standard ICRC1 canister. We see that average ping time is usually around 3.2 seconds whereas average HPL latency is usually around 6 seconds when there is no load on the system.

rumenov · November 3, 2023, 12:04pm

I assume you are using either agent-rs or agent-js. Both agents continuously poll the state in order to return the result to the user.

In the coming weeks we will propose an addition to the HTTPS interface, that will include something like a synchronous call endpoint. The semantics will be that the user sends a update request and the HTTPS requests returns when the request has been executed by the execution layer, so no polling required.

This low hanging fruit will automatically shave off ~1 sec of the e2e latency.

timo · November 3, 2023, 1:21pm

No, we are using a special HPL client library which in turn uses components from agent-js but it’s not just a wrapper around it. We submit the transfer to the aggregator and then “poll” the ledger but with query calls. So the polling that you mention isn’t happening here or if it is then it is not in the critical path.

But we would still like to use the new feature in the HPL client library when it’s available. Even if it won’t give us end-to-end gains. Thanks for pointing it out!

EDIT: The supplemental “ping time” test uses agent-js. We have to adjust the polling interval to make for a fairer comparison.

timo · November 5, 2023, 8:39am

In the context of a multi-token ledger, if the flat fee is to be fair (i.e. equal) across tokens then it requires knowledge of value or exchange rate of all tokens. We would need to know at least the exchange rate of each token to one common token (say for example ICP). Then we can set a constant fee denominated in the common token and convert it to flat fees denominated in each and all of the other tokens. And the flat fees indeed adapt, as you suggested, as the exchange rate moves.

For the time being, without having those mechanisms available, the percentage fee based on tx value is the easiest way to have any kind of fee at all that is fair across tokens.

The next easiest will be to denominate the fee in a dedicated fee token regardless of which token is being transacted. But it has the drawback that users need to hold the fee token.

After that in terms of complexity comes a flat fee denominated in the transacted asset based on knowledge of exchange rates.

memetics · November 5, 2023, 9:28am

Any way to early adopt HPL?

Topic		Replies	Views
Proposal: Add transaction scalability to ICRC ledgers via batching endpoints Developers	1	224	March 16, 2024
Update Real TPS to Improve Mainstream Adoption Press & News	3	1481	February 15, 2024
Technical Working Group: Scalability & Performance Developers Discussing , community-consideration	173	9989	June 24, 2025
Project recap: Light Client integration for ICP General	13	1549	October 16, 2023
IC Crashed (or almost) during ICPI mint (GHOST team) DFINITY	10	2033	December 22, 2023

Announcement: HPL - a ledger for 10k tps

Related topics