Hi Rosti. Thanks for offering to increase the limit. But I am not currently hitting the rate limit. Each boundary node allows 300 ingress messages per target subnet. My load script distributes the load equally over 10 boundary nodes. That is way below the limit.
Say there are N target subnets (= number of aggregators) and M boundary nodes and I send a total of R tps. Then R/(NM) must be < 300. In my case I have R = 5,000, N = 11, M = 10, so R/(NM) = 45. So I am still very far away from the rate limit.
Yes, if the rate limit was increased then I could simplify my load script and I could send the whole load to one boundary node instead of 10. But it is actually a more realistic simulation of real-world usage if the traffic goes through different boundary nodes. Also, I am not sure if that would help me. It is hard in practice from a networking perspective to get that many requests per second to a single server reliably and at a constant rate. I am afraid that it would result in higher fluctuations in my tps rates. Going through 10 boundary nodes evens out the natural fluctuations in the speed and quality of the network connection.
This statement refers to a consensus bottleneck which still exists regardless of what the boundary nodes do. This limit comes from how many individual messages can be gossiped per second between the nodes of a subnet. Because they are individual messages there is some limit regardless of how small they are. I believe this limit is < 1,000 but I have not actually tried to push it that far. The most I tried was 450.
Looking at boundary node metrics there was an increase of 429 responses exactly when you were running your load. Thatâs why I thought you are hitting the boundary node limits. Looking more carefully I do see some 429 responses coming from replicas. More specifically I see ~200k 429 responses starting 10pm on 27th until 6am on 28th.
There is no explicit limit on the ingress messages per second we gossip. So the number of messages really depend on bandwidth, latency, finalization rate and if the code actually pipelines/multiplexes messages well.
Most likely the replicas were not gossiping fast enough hence the ingress artifact pools got full, hence the 429 but some metrics are inconclusive.
I will take that statement back. Block making should be independent from gossiping. So even if gossiping completely fails, due to being overloaded or anything else, block making should still work. It just means that ordering isnât fair anymore and the variance in latency goes up significantly. But throughput shouldnât be affected by gossiping. Throughput should indeed only depend on message size. So maybe it is possible to get >1000 ingress messages per second into a subnet, just not in a very fair and responsive way.
Thank you for your response regarding latency.
I would expect enterprise scale transaction processing to be possible if the ledger is scaleable. I donât see any issues compared to the Solana and Layer2 solutions that Visa is testing. What are the advantages over these?
What is the use case for HPL? Are you thinking of adapting it to the enterprise or building a huge ledger system including ckBTC, ckETH, and other ICRC wrapped in HPL?
Also, is it a technical limitation that you canât transfer money to/from other accounts only through virtual accounts? Or is it to solve the disadvantage of not being able to create a common account by creating a separate Principal for Internet Identity authentication?
Thatâs only âvalidationâ which means it is inside the execution layer. It doesnât include any networking / consensus. They made a new C++ implementation of the validator and to show how optimised it is they compare the pure validation (aka execution) speed.
But the real bottle neck is always networking which makes the cited number of 600k irrelevant for throughput. It is relevant if validators want to catch up and re-validate the entire history of the chain, which is probably why they optimized it.
For networking / consensus you will see a wide range of throughput numbers thrown around but they are not always comparable. A big differentiator is fairness. For example, if I just go round-robin through a set of block makers in a pre-determined way then the user can directly submit to the next upcoming block maker (or the next N blockmakers as a back-up). I donât need any gossiping of individual transactions between block makers. They can focus on sending entire block proposals, which are essentially large batches of user transactions, instead of individual transactions. But this is susceptible to front-running. If instead you gossip individual transactions between all blockmakers and only determine the next block maker a fraction of a second before the block is made then you have a much harder problem, but are less susceptible to front-running.
Gossiping is expensive and slow and the cost increases significantly with the number of messages. It is a huge difference if I gossip 10k individual messages immediately as they come in, from the one node who received the message from the user to all other nodes, or if I collect them all in one place (a sequencer) and then send all 10k in one batch to other validators. The two problems donât even compare. Throughput with batching is practically unlimited when compared to a protocol based on gossiping individual messages. For the latter 1,000 tps is already an achievement.
That being said, applications are also not always comparable. For a ledger that we are building here fairness of ordering is probably not so important. Because we donât usually double spend our own transactions. Nor would anyone be interested in front-running a payment that I make. But for a DEX it is very important.
What is the question exactly, you mean what are the advantages of HPL over what Visa is doing? Do you have a link to Visaâs projects?
The latter. One place where all other tokens can be wrapped plus native ones can live and where they can be atomically swapped. And one place, when integrated into a wallet, hardware wallet, or exchange or other service, then all tokens are immediately integrated as well.
It is not a technical limitation, it is a deliberate one. And it has nothing to do with II. I have plans for direct transfers as well without virtual accounts.
What is measured here is end-to-end time for making a transfer, i.e. the time from initiating the transfer to receipt of the result, both measured in the frontend. The highest level of detail is given by the heatmap in which even the outliers can be seen. We also show the average and various percentiles (50%, 75%, 90%, 95%). All three together should give quite a complete picture.
We have selected 5 aggregators that we will continue to work with. The dashboard shows a combined number for all of them, i.e. what we see is essentially an average over all aggregators. But we also have a couple of graphs for each individual aggregator including average latency and heatmap (not percentiles). Average latency should be enough to compare the aggregators against each other and the individual heatmaps allow us to see any concentration of outliers which usually point to a communication problem of the subnet hosting the aggregator.
Finally, for comparison, we also measure âping timeâ which is a direct update call to each of the canisters (aggregators and ledger). This is the latency that we would experience if the HPL did not use xnet hops, e.g. if it was a standard ICRC1 canister. We see that average ping time is usually around 3.2 seconds whereas average HPL latency is usually around 6 seconds when there is no load on the system.
I assume you are using either agent-rs or agent-js. Both agents continuously poll the state in order to return the result to the user.
In the coming weeks we will propose an addition to the HTTPS interface, that will include something like a synchronous call endpoint. The semantics will be that the user sends a update request and the HTTPS requests returns when the request has been executed by the execution layer, so no polling required.
This low hanging fruit will automatically shave off ~1 sec of the e2e latency.
No, we are using a special HPL client library which in turn uses components from agent-js but itâs not just a wrapper around it. We submit the transfer to the aggregator and then âpollâ the ledger but with query calls. So the polling that you mention isnât happening here or if it is then it is not in the critical path.
But we would still like to use the new feature in the HPL client library when itâs available. Even if it wonât give us end-to-end gains. Thanks for pointing it out!
EDIT: The supplemental âping timeâ test uses agent-js. We have to adjust the polling interval to make for a fairer comparison.
In the context of a multi-token ledger, if the flat fee is to be fair (i.e. equal) across tokens then it requires knowledge of value or exchange rate of all tokens. We would need to know at least the exchange rate of each token to one common token (say for example ICP). Then we can set a constant fee denominated in the common token and convert it to flat fees denominated in each and all of the other tokens. And the flat fees indeed adapt, as you suggested, as the exchange rate moves.
For the time being, without having those mechanisms available, the percentage fee based on tx value is the easiest way to have any kind of fee at all that is fair across tokens.
The next easiest will be to denominate the fee in a dedicated fee token regardless of which token is being transacted. But it has the drawback that users need to hold the fee token.
After that in terms of complexity comes a flat fee denominated in the transacted asset based on knowledge of exchange rates.