Let's solve these crucial protocol weaknesses

Alas, I am not technical. But rest assured, I have common sense.

@lastmjs invite the AO founder on the forum to read this thread! I’m sure he would be thrilled about a technical discussion of AO on another blockchain forum.

2 Likes

@lastmjs have another perspective, I’m not sure what he does in the dev community if hers so frustrated ? Why not just leave and go and build on aws, and build with off chain “smart contracts “ on AO?

1 Like

@lastmjs does and has done heaps for the IC developer community including creating and maintaining Canister Development Kits for TypeScript and Python in the IC, promoting to and then supporting new IC developers, being a public voice for the IC to outsiders and coming back with critiques to challenge insiders. Feel free to criticise his ( or anyone’s) approach here but maybe some respect for his efforts and commitment to the IC platform and community is also fair and due.

Although some of us may have found his initial post here challenging and very direct, you can’t fault the effect it had in extracting strong but considered responses from various people with different takes on this and the value of those as a whole to moving this platform forward.

We need more of these strong forum threads, not less.
I still remember some from 2023 that went this long but devolved into personal accusations and sniping that harmed (in my opinion) the IC community in the the way they were conducted. This is the opposite of that.

17 Likes

Regarding instructions limits, can’t we have universal chunking/ automatic instruction recursions?

When a process timelimit is reached and it hasnt finished, it would store the snapshot and re-call itself in another thread. This is “normal” computer science that could be built into the protocol layer making life easier for devs and enabling new stuff to be built. Alternatively we could look at this functionality as a map reduce for instructions. There are different implementations that we can do but in general, it allows for a seamless instruction stream. Streams in general are a coding paradigm in functional programming and near-real-time data processing. So it is fitting to bring this natural practice over.

Possibly all the instructions defined would have the ability to opt in and configure the details (recursion specifics and max depth into recursion). This would create the possibility of long-running stream of instructions.

It’s like a heartbeat/ breath for actor model. Imo, limiting stops are philosophically fitting and technically it’s likely possible to circumvent them thus solve the core problem. It allows us to scale processes and increase what is possible on ICP. It would also keep cycle calculation trivial, as you still have finite instructions that cost some cycles. And finally, it would allow us to keep what is already working (all on- chain!) and add a new layer of functionality on top.

1 Like

The issue here is the programming model(not that it makes this impossible, just more dangerous). If another process slots in the middle of your recursion, they could change something you are computing over out from underneath you.(we already have to handle this with async calls, so it isn’t insurmountable).

If they can’t slot in between your calc rounds then your entire canister is blocked by this one execution until it finishes.

1 Like

If we consider an actor to preserve and guarantee the ordering of messages/ processes, then with “auto incursion”, it would be fine if it is blocking. Or it’s not necessarily blocking, just has a preceding instruction added to the front of the stack. It is queueing other instructions. Depending on the point of view, if we didnt do this (added the recursion to the end of the stack) then the instruction as a whole didnt preserve its order.

Also, instructions should have depth limits to prevent deadlocks. Im aware it brings more complexity but at the same time, we need long-running processes and I feel it could be more natural and in other aspects easier if we have a stream of finite instructions instead of infinite instructions.

1 Like

Ok…yes…so we have blocking. I get that this is ok for something like a workload-balanced compute canister where you can distribute load across different canisters (although I’m not sure what multiple canisters all running long workloads would look like on a subnet). But one of Jordan’s initial gripes was that he was having trouble running express.js as a web server. A web server that can serve one concurrent page at a time isn’t very useful. Or a SQL canister that can’t serve queries or updates while it is indexing. I think there are great answers for sequential processing, but I think that the actor model just has a tough time mimicking the multi-threaded compute that we’re used to on an AWS server(without purposefully written software that abstracts away the ‘roundness’ of the underlying IC protocol).

2 Likes

Note: I am posting this not as Diego, the person, but on behalf of DFINITY R&D Team.

Thank you everyone for both posting and reading, a lot of folks have joined in on this thread to ask for clarification, respond to queries, propose ideas, but they mostly have been doing it as individuals. We thought it would be helpful to post what we (the org) is currently thinking.

TLDR:

There are various points brought up. Two obvious themes from reading this thread are:

  1. We (at DFINITY) need to publish our latest thinking on the R&D projects DFINITY is planning so we can explain our thinking, get feedback and align expectations. We include a high-level view of this roadmap in this post.
  2. There are a lot of ICP community members who deeply care about the R&D and wish to make it better. We recognize and appreciate it!

What we saw

Here are some of the points that folks have brought up as issues with the current state of the protocol:

  1. Instruction limits
  2. Memory limits
  3. High latencies
  4. Message size limits
  5. Storage limits
  6. High costs
  7. Rigid network architecture
  8. Centralizing DAO governance

First, I realize it can come off as nitpicky, but because forums are a written medium, it is critical to get every word right. That is why I would argue with some like “high costs” since costs are fairly low relative to Web3… but we can understand if people want to lower them!

Similarly, we can understand if people want to raise the instruction and memory limits. This goes with the course of Computer Science of course: make it faster, bigger, cheaper. No one in the ICP community thinks the IC is perfect “as is”… that is why the NNS exists.

The only question is really about priorities and trade-offs. I will use Jordan (@lastmjs) as an example for a second since he is the OP of this thread: I know he hits the instruction limit often. I know he works around it. The hard part (and one which I think requires many voices for input) is to know how many people this impacts, how urgent it is, what is not happening because of current limits, or how that compares with some unforeseen security issue X, etc… And of course the process is messy and qualitative.

So without getting into every point by point of this large thread, we thought we’d present a relevant part of an updated R&D roadmap that is to be published in a few weeks.

R&D Roadmap

First of all, we would like to point out that the current R&D roadmap on the Internet Computer Web page is somewhat outdated and incomplete and has not been maintained well recently.

Also, its horizon is mid term and it does neither contain our visionary long-term roadmap items, nor many of the community-requested items.

We are currently in the process of defining a new Internet Computer Technical Roadmap that is much more comprehensive and long term than the current roadmap and also much more inclusive of community-requested features. This new roadmap addresses (at least good parts of) the criticism voiced in this thread. We cannot yet provide the whole new roadmap proposal now as it is still work in progress, but we would like to give some examples of items that are on the new roadmap and can help resolve some (many) of the issues raised in this forum topic.

Past Achievements

Before we present the relevant parts of the new roadmap, let us briefly revisit some of the ICP protocol improvements that have been deployed already and also address some of the issues raised in this forum topic.

  • Increase of Stable Memory Limit to 32GiB (the initial value was 8GB)
  • Increase of Stable Memory Limit to 48GiB
  • Increase of Stable Memory Limit to 96GiB
  • Increase of Stable Memory Limit to 400GiB
  • Support of 450GB replicated storage
  • Support of 750GB replicated storage
  • New HTTPS outcalls pricing (with lower cost per call and byte)
  • Deterministic Time Slicing (DTS) (increases the instruction limit by an order of magnitude)
  • Network Scalability: State Sync, Certification, and XNet
  • Optimizations throughout the protocol stack

These improvements show well that we all are dedicated to improving the protocol and reaching the ultimate goal of a “crypto cloud.” Also, some of those improvements are hard to “see” or “feel” by users / developers, unless they have hit a limit earlier and could benefit from those changes. As another example, substantial efforts have been put into an update of the P2P layer, resulting in a further 20% latency decrease, thereby increasing the block rate. This looks like something small, but is a huge gain in an already performant system where we are slowly approaching inherent limits (e.g., global message propagation time).

The Story Behind the New Technical Roadmap

Let us also give some background on why we have been working on a new roadmap. The current roadmap only contained items that were already in the pipeline of the R&D teams and discussed technically to a certain level of detail and decided that they will be implemented as one of the next items. This roadmap does not contain the long-term visionary items that are not yet well defined in terms of their scope and technically checked to be implementable. It has been clear to leadership in the Foundation that this needs to change and that the public technology roadmap needs to reflect the long-term vision for the Internet Computer Protocol and the needs of our (technical) community. Also, it should not be constrained to items that have been analyzed already in detail, but rather be visionary and also contain items that we (both the community and the Foundation) want, even without a detailed analysis having been done, with the risk that we find out later that a roadmap item may not be realizable for some (technical) reason.

People within DFINITY have listened a lot to the community on various channels and included many of the asks of the community in the new roadmap (this forum thread is an example of such a channel and listening). This is reflected on the one hand by the overall composition of the roadmap that strives to build the crypto cloud of the future, and on the other hand very concrete items from the community that have made it into the roadmap. Once the full roadmap will be published in a few weeks, you will recognize many of your own inputs in one form or the other there. We have listened. We are listening.

A Selection From the New Technical Roadmap

Despite all the achievements so far, we are all well aware that a lot still remains to be done in order to get where we want to be. Besides realizing new features, the new roadmap contains many items that target the improvement of performance of ICP, both in terms of better throughput and scalability as well as reduced latency. Developer experience (DX) is another crucial topic, where we have deployed a feedback board, including voting, for actively listening to the community. We may want to have built-in voting capabilities for the new roadmap to allow people to express their preferences on priorities. That’s an idea for later, though.

Let us next present some relevant items from the new roadmap that will help achieve the goals. We present them in the context of the respective points (in bold) of criticism raised in this forum topic.

1. Instruction limits

  • Increasing instruction limits for query and update calls (incl. increasing the number of rounds DTS can span)

2. Memory limits, storage limits

  • Increase Stable Memory Limit to 400GiB (we’re almost there!)
  • Add support 1TB replicated storage (in progress)
  • Add support 3TB replicated storage
  • Add a Motoko Incremental Garbage Collector (required to manage large heaps, in progress)
  • Add orthogonal persistence for Motoko (to provide the DX for using large memories in Motoko, in progress, demoed already in GR&D)
  • Implement higher-level stable memory libraries (to provide the DX for using large memories in other languages)
  • Deploy a blob storage protocol extension (increasing storage capacity per subnet using erasure coding instead of n-fold replication)
  • Implement fast blob streaming (download)
  • Implement fast blob streaming (upload)

3. Latencies

  • Reduce P2P latency (this immediately reduces latency of consensus and, by implication, of XNet calls which involve consensus twice, once for the request and once for the response)
  • Reduce call latencies (reduce call latencies for the most-used call patterns that negatively affect dapp responsiveness)
  • Improve chain-key ECDSA throughput and latency
  • Implement low-latency threshold ECDSA signing

4. Message size limits

  • Increase block size and throughput
  • Implement chunked upload of large Wasm files in dfx (in progress)
  • Allow for small guaranteed-response messages (this is not about increasing limits, but requiring less reservation of space for the response, therefore allowing for considerably scaling messaging as a whole)
  • Implement libraries for large data transfers (abstract away some of the message limits)

5. Lowering Cost

  • Implement low-replication subnets for LLMs

Note: Cost is very low compared to other networks already and there are initiatives to further lower cost, also as part of other features, e.g., lowering storage cost through blob storage or having low-replication subnets for LLMs, improved throughput for chain-key / t-ECDSA signatures with lower cost etc.

6. Softening limitations of the network architecture

  • Add canister migration
  • Add subnet splitting V2 (simple proposal-driven interface)
  • Add autonomous capacity management
  • Allow for same-subnet placement for canisters
  • Implement messaging model enhancements
  • Implement Improved inter-subnet message routing
  • Implement XNet composite queries

Potentially, lower replication subnets or even single-node subnets (already discussed on the forum elsewhere in the context of “gaming subnets”) would be something that may be of interest for some in the future to give developers more choice than just the two replication factors (app subnets and fiduciary subnets) developers can currently choose from.

7. DAO governance

The voting power distribution has become more decentralized over time; DFINITY’s direct voting power has decreased from just under 40% around launch to just under 20% now. Of course, a large number of NNS participants have currently configured their neurons to follow DFINITY, which adds a large pool of indirect voting contribution. One main requirement for this to change is the availability of reliable alternative options for following. CodeGov is a great initiative that has reliably (and transparently) voted on the proposal topic on replica code upgrades; more such initiatives will be needed before the total (i.e., direct and indirect) voting contribution will significantly change.

8. more relevant items from the community to improve developer experience

  • Implement HTTPS outcalls V2 (various optimizations and new modi operandi for HTTPS outcalls: IPv4 support, single-node outcalls, fire-and-forget outcalls)
  • Implement HTTPS outcalls V3 (HTTPS outcalls for queries)
  • Support REST- & JSON-centric interfaces
  • Standardize canister response codes
  • Build a file system on ICP
  • Build a simple file upload library
  • Enable simple asset creation (e.g., token ledgers created via proposal)
  • Allow for Motoko-written interactive Web UIs running in Wasm
  • Build more Motoko libraries, or support the community in doing so
  • Implement language-interoperability support in Motoko so that Rust (and other) code can be used in Motoko-based canisters

The above is only a small part of the updated Internet Computer Technical Roadmap that is to be published in the coming weeks. As you can see, these items address already important parts of the criticism in this forum topic and it is unfortunate that we have not been able to publish it yet as this could have avoided (some of) the current discussions. Priorities of the items are yet to be determined together with the community. So, clearly, DFINITY agrees to the importance of some of the points raised and has already started taking action.

A Reflection of the Forum Discussion

In most parts, the forum discussion is a productive and technically-driven one, and it also shows that people feel that there is still lots of work to be done for improving the protocol. We fully acknowledge this and would like to make clear that we have the same vision for ICP being the decentralized crypto cloud of the future, but we have not reached all the goals yet. We have started the journey and have made tremendous success on this journey so far, just mentioning some key points below:

  • ICP is a world computer that you can program in Rust, Motoko, TypeScript and Python (many thanks here to Jordan for his great work!), C, C++, anything that compiles to Wasm) and every instruction is executed and every bit of memory is deterministically replicated throughout a globally decentralized network, providing a high degree of data integrity. Thus, a lot has been achieved already in terms of abstracting away the decentralized nature of the system.
  • ICP currently supports 750GB of replicated state per subnet that behaves like random access memory, with almost 40 subnets being deployed.
  • A subnet can execute around 8 billion instructions for update calls per second (assuming currently 4 cores executing update calls on a replica) and a much larger number of query instructions per second.
  • Subnets have been largely abstracted away in the programming model, except for the extra latency incurred in XNet calls compared to calls on the same subnet. Those are caused by 2 rounds of consensus required to secure the request and response to achieve the desired properties of XNet communication being certified through threshold signatures.
  • ICP integrates trustlessly with Web2 and other blockchain ecosystems, using advanced chain-key cryptography. This is a feature not many can claim to support in today’s Web3 landscape.

The journey to realizing the full vision of the world computer is ongoing and requires continued hard work and relentless execution of a dense R&D agenda, for example:

  • Further improving throughput and latency of the different call patterns on ICP.
  • Adding further network management capabilities, such as canister migration, decentralized capacity management etc.
  • Improving the DX.
  • Increasing limits, and where we hit boundaries, offering libraries to abstract away the limits, as far as possible.

We rely on you, our community, to challenge the state of the art of ICP, provide us with inputs of what you want to see built in the future so that we can together realize the vision of the world computer. We acknowledge that publishing the updated roadmap earlier might have made many of the current discussions unnecessary, but there have been some delays in bringing the new roadmap to completion.

I very much liked the depth of a post by Austin (@skilesare). Let me quote one essential and broadly applicable thought from it:

As long as we try to make the IC about replacing existing things we don’t like with something on the IC with THE SAME ARCHITECTURE, we’re doomed to fail.

This is something absolutely crucial to keep in mind when thinking about limitations of ICP – any Web3 system works vastly differently than Web2 and this does need to be considered when implementing Web3 applications and when thinking about performance.

Conclusions

We hope that this post and the preview on the new technical roadmap it presents help establish confidence that the criticism raised by the community is being addressed already by the Foundation and that much of the community input for potential protocol improvements that has been voiced on the forum, Twitter, and other channels has been considered already in the upcoming new technical roadmap. The Foundation is listening, has always been, and benefits a lot from the community inputs.

Quoting from Jordan’s (@lastmjs) original post here:

Wisdom is of course required to weigh these concerns with the many other concerns.

This could not be expressed any better! It is really crucial for everyone to keep in mind that the roadmap items to address the most burning issues can not all be implemented at once because of finite engineering resources, but that this is a longer-term process that will require a continued discourse between the Foundation and the community to find the right priorities and balance between the items and relentless execution by implementing the most valuable improvements to the protocol.

53 Likes

Fully agree with this. There are new opportunities and use cases in IC that are waiting to be discovered. It just takes time to learn, experiment, and adapt.

TLDR:

  1. The average user doesn’t understand what they are voting on.
  2. The safest investment is to follow DFINITY.
  3. There is not sufficient infrastructure in place to support DFINITY neuron competitors.
  4. Trust takes time and effort to build up.

This probably deserves it’s own thread, but I think there are critical developments needed to address this item. The centralization issue comes up a lot. I’ll briefly outline here what I think are some problems that might be more obvious to a user than a dev.

  1. The average user doesn’t understand what they are voting on.
  • Democracy assumes an informed public. Many people are staking for rewards but have no idea what individual proposals mean or why they should care. The proposals are simply too technical.
  • What they DO care about is maximizing voting rewards, which brings me to…
  1. The safest investment is to follow DFINITY.
  • DFINITY always votes. Other neurons are a gamble. They might vote consistently this month but lose interest next month, get distracted by life, have a serious hardship, or any one of many problems. If you want to maximize your investment and you can’t vote manually on every proposal, you follow DFINITY.
  • This is a practical rather than a theoretical problem. This could be solved with more organization, but…
  1. There is not sufficient infrastructure in place to support DFINITY neuron competitors.
  • This is a complex issue and I understand DFINITY envisions this happening organically, but I think this is unlikely to happen without changes in either government structure or substantial support for democratic opposition.
  • The Named Neuron option imo was a helpful step in this direction. It fosters in a small way the representative democracy that will be necessary due to point #1. However…
  1. Trust takes time and effort to build up.
  • Most users are likely happy with what DFINITY is doing. They trust the team and appreciate their work and expertise. There’s little reason to risk new leadership.
  • There aren’t clear alternatives that could fill DFINITY’s role in developing the IC, and there won’t be for some time.

Even though DFINITY’s direct voting power has decreased, I suspect it will maintain a controlling vote for years to come. DFINITY finances and expertise are not easy to replace, even with crowdsourcing. But I think some concerns could be allayed by further developing this part of the roadmap.

6 Likes

@diegop amazing that you are already planning to address all the concerns raised! This matters a lot to the community.

I hope to see in the roadmap (even if not a top priority) something like:

These would make colluding much harder and provide even more solid foundations to the IC itself.

7 Likes

If we separate read and write into different threads then it should be fine. We should always read fast, that is, this process shouldnt be long-running (although in practice reads can be complex). For complex reads we should have a caching/ indexing L2 (in web2 world Cassandra, Elasticsearch etc). It’s a background job that updates the db and the reads are still fast. This layer is inevitable if we want smart data and AI. And it’s fine when writes are blocking. They should, logically. We should preserve the ordering of changes.

So still we could have finite instructions, but have auto-recursive blocking/ queuing writes. And we need an indexing L2 capability on top of raw data for near real time data insights. Essentialy it’s an evolution of composite queries.

Thanks for this summary @diegop & DFINITY, and for all of the thoughtful and constructive feedback throughout this thread. It’s been a joy to read through!

To dig into a few points around the utility of that increased memory here that currently affect my development decisions, app roadmap planning, and personal developer experience (DX):

Improved Orthogonal Persistence abstractions

I am absolutely thrilled by this development, and noticed there was a lot of interest in the Motoko orthogonal persistence abstraction during @luc-blaeser 's global R&D presentation - not just from Motoko developers, but from Rust and Azle devs as well, it’s a very powerful DX improvement. As DFINITY continues to raise memory limits, the ease of putting that extra memory to use is just as important. Which brings me to:

Continuing to improve the performance of stable memory

I know that DFINITY has made a huge amount of progress on improving the utility of stable memory over the past year, from the improvements to the ic-stable-structures library by @ielashi & team, to the performance improvements to stable memory by @abk & team. :clap: Thanks for this work guys.

That being said, I know there are still some noticeable performance disparities between access patterns between heap and stable memory for different data structures (like BTree) due to the many comparisons required when navigating a balanced tree structure and the frequent need to jump back and forth between stable memory and compute (even with the improvements that have been made).

When Motoko orthogonal persistence is implemented, I’d imagine there are going to be a lot of teams using stable memory all of a sudden, so continuing to improve the performance of repeatedly scanning and computing over stable memory will have a big impact on a large number of teams and projects.

Wasm64

On the same thread of making that extra memory easier to use, I also see that Wasm64 isn’t mentioned in this new technical roadmap. I understand that it’s a big undertaking, but allowing heap (main) memory to access the full memory available to a canister will open up more general “computing” purposes, like elasticsearch, bigger data analysis (400GB isn’t quite big data yet :sweat_smile:), and everyone’s favorite, AI :grin: .

If we get to a good place with both stable and heap memory able to access the full canister memory address space, I can then envision most applications being architected with a metadata canister and a compute canister, where the metadata canister uses stable memory and stores a significant amount of data, while performing simple reads and writes. The compute canister, using heap memory (wasm64) receives periodic input updates from the metadata canister and runs various algorithms to compute additional values (for feeds, to update property weights, etc.), and then in turn may update static values back on the metadata canister. For this type of an architecture the compute canister would also directly called for requests that require a large amount of compute.

The benefits of stable memory being:

  • orthogonal persistence
  • quick, safe, and cheaper upgrades
  • performant but not with performance equal to heap/main memory (or is this possible?)

And the benefits of heap memory being:

  • Higher compute performance and in turn, lower compute cost
  • Given increased compute performance, more compute is possible before message hitting instruction limits
6 Likes

This is fine for queries, but more difficult for updates that need to read the state to validate, do something that takes a while, and then continue. Its the weekend so I’ll not spend too much time coming up with an example :joy:.

say:

 let result = if(validcaller(msg.caller)){
     await do_a_long_thing();
 };

//revalidate
if(validcaller(msg.caller)){
   state.result = result;
} else {
   await roll_back_a_long_thing();
}

There isn’t a great way to keep from doing a bunch of work and wasting cycles in this instance if the caller is invalidated during the do_a_long_thing. It is just a thing you have to do in your code and hope for the best. (you can lock the valid caller collection perhaps).

There is also the possibility of reading ‘dirty’ data while do_a_long_thing is running.

In general, async is hard and we need good patterns to make it easier, especially if we’re going to have long-running processes that update state in chunks.

2 Likes
  1. Rigid network architecture (subnets static, canister unable to choose replication/security with flexibility, can’t move between replication factors, homogenous hardware required)

This is what I pay the most attention to, because it is also the most intuitive and easiest to see. As of now, there is no direct answer to this one, and no possible solution is mentioned.

The answer I want to see is not how much the team has done (we can know this from multiple channels), but a direct and positive answer.

Dear Dfinity team (through dear @diegop),

Just a remark about this.

When I decided to invest in blockchains, Dfinity was far beyond other teams about giving me confidence as an investor. Why ? Because I am a scientist and I could note the scientific ethos of the team.

You were not promising the moon, were not selling idealistic dreams, but realistic and nevertheless revolutionary goals. I told myself : « this team is serious. It appears in their way of writing, constraining their promesses. They are scientifically prudent and invest in them is a rational decision because they are clearly working rationally ».

I understand that the scientific ethos can be marketingly injuring (because the investors are not used to analyse projects through this kind of criteria and will more easily invest in sellers or dreams), but I just wanted to share with you that I think you really must keep as much as possible this spirit which characterizes you all, because on the long run, it is your strength.

I just would not want to start seeing emerging lists of visionary and sexy aspirations, but eventually not resulting in concrete realizations. We have far enough projects already doing this. And even if you take the caution to warn that you will envision items potentially not reachable, these not achieved items will stay and accumulate in a list of unrealized aspirations sounding like dreams made by people not estimating correctly either their capacities or what is doable.

I know that in the science, we need to allow oneself some liberty for dreaming and being visionary in order to move forward and reaching intermediary results even if at the end of the journey, we realize that we can’t reach the initial goal. Often, it is only by dreaming that the intermediary results are gotten. But even if they must dream, junior researchers are asked to stay realistic within their dream when they design their roadmap. I know you will act being « realistically idealistic », but I wanted to let you know that your prudent and scientific ethos has been highly appreciated, is a strength, and that some people support you for maximally preserving this ethos even if you decide to allow yourselves to be more visionary.

21 Likes

Quick update on the DTS instruction limit.

I merged the change that increases the instruction limit for updates, timers, heartbeats to 40B (2x of the previous 20B limit): feat: Increase the instruction limit for update calls to 40B · dfinity/ic@8e51868 · GitHub

This should be included in the next replica release (to be deployed on the mainnet next week).

We are close to shipping the configurable Wasm memory limit feature (we were waiting for it in order to increase the limits).

14 Likes

Are there any plans regarding the query instruction limits? Currently, there’s a 5B limit on queries. We’ve recently faced issues and spent a considerable amount of time optimizing our Motoko code to ensure a query fits within this limit. It would be helpful to understand the long-term direction in this area.
I assume the current limit represents a compromise to allow a certain number of requests per second on the subnet. Increasing the limit from 5B could, in theory, reduce the network’s throughput capacity.

4 Likes

Increasing the query limit requires some form of query charging. We are thinking about the idea that has been proposed by the community:

  • add a field in the canister settings to enable (opt-in) to query charging.
  • if that field is set, then the canister starts paying for queries and can get the higher instruction limit.
13 Likes
  1. Rigid network architecture (subnets static, canister unable to choose replication/security with flexibility, can’t move between replication factors, homogenous hardware required)

Any plan to fix this one?

See above:

At the moment there is already ongoing work to make the “Swiss subnet” (i.e. a geographically constrained, reduced replication subnet) possible. One specific issue is preventing a potentially corrupted subnet (more likely with lower replication / lower decentralization) from minting cycles and sending them to other subnets.

There is indirectly related work already happening on supporting references in blocks instead of full payloads; and on canister messaging scalability. Both of which will eventually allow for higher throughput and lower latencies, opening more possibilities for canister migration / grouping and autonomous capacity management.

Regarding the specific issues of static subnets and the requirement for homogeneous hardware, my personal opinion (and it’s just that, so take it for what it is) is that they are core to the design of the protocol, so they will be difficult to change.

I guess it also depends on what you mean, exactly. Specifically, “homogeneous hardware” as in “equally powerful machines” is needed in order to avoid long tail latency: if half the machines on a subnet are really slow, then they dictate the speed of the subnet and they waste capacity on the other half that could do more work. If you mean “homogeneous hardware” as “a limited set of tested and certified hardware configurations”, that is partly the same as above (e.g. a 5.6 TB SSD is not the same as any other 5.6 TB SSD) and partly it makes it possible to manage a fleet of thousands of machines without excessive manual intervention (e.g. reasonable BIOS settings enforced automatically).

Same with “static subnets”: it could be taken to mean “NNS proposal required to change membership”, something that I’m sure everyone would like replaced with some sort of automatic process to monitor and replace unhealthy / misbehaving replicas (which may or may not be the “autonomous capacity management” mentioned above); if you mean “periodically rotating replicas in and out of subnets” my personal opinion is that it (1) doesn’t help much with security and requires a lot of bandwidth and (2) once you have the automatic process above, it could be trivially implemented as part of that. If you mean it as (something I’ve seen mentioned every now and then) hiding the identity of the nodes making up a subnet, that’s totally unrealistic; the boundary nodes need to know the membership; and even if they didn’t, one could follow the traffic: either you send every huge payload to every single replica on every subnet, in order to hide which ones actually need it; or you can track which ones are getting the payload (if you have access to a single replica). So it’s security through obscurity at best.

To summarize, work is being done, partly on some of the issues themselves, partly in preparation for addressing others. And there are ideas and plans to address most of the issues that can and should be addressed. It will just take time and effort.

6 Likes