Me too, I am wondering if it is possible to manage data across multiple subnets within a single application on the Internet Computer (ICP). Apparently tagging canisters to a GDPR related subnet is feasable. How does this association of canisters to particular subnets works? I could be totally mistaken, but eventually we can use those tags to address performance issues when scalig up.
I would like to develop a large-scale application on the Internet Computer (ICP) that stores multiple terabytes of data(100TB+) along with backup capabilities. My main concern is whether ICP can handle this scale of data efficiently, particularly regarding scalability issues which have recently emerged and cost-efficiency. Given the potential challenges around cycle consumption, and subnet limitations, would this approach remain practical and cost-effective in the long term, or should alternative storage solutions be considered to mitigate excessive costs and performance bottlenecks?
I appreciate an answer focusing on the efficiency of ICP in such a large-scale use case, especially in terms of scalability and cost management, In terms of both pros and cons. Thank you for your valuable insights.
The IC can’t currently store 100TB, see this comment above:
Hi everyone,
We’ll have the next meeting this Thursday (October 24) at the usual time. Per @icme’s suggestion we’ll discuss some of the performance issues that have come up in the last months as well as fixes and possible future improvements. Feel free to come with questions as well!
Thursday October 24th at 5:30 pm CEST Zoom link
Ahead of the meeting on Thursday, I wanted to bring up a question regarding the future of ICP’s storage capabilities. Is there an ongoing effort to increase its storage capacity, and if so, do we anticipate ICP being able to efficiently store terabytes of data within the next 1-2 years?
I’m also curious about the roadmap for ICP storage subnets. Are there any expected timelines for their release, or updates on how they might address scalability and cost-efficiency challenges, especially for large-scale applications?
Looking forward to the discussion and any insights during the meeting!
Hi ADK,
Thank you for setting up this call.
I wanted to share a bit about the data architecture we’re planning. You can find a brief overview in the link below. I’m open to any feedback you might have, though I don’t necessarily expect any—it would just be a nice bonus.
Looking forward to joining tomorrow!
Best,
skai8888
Link to planned data architecture:
Looks ok to me, but maybe some dapp developers have other opinions. I’m not sure if I follow where the canister boundary is though - for example at the “User Level”, is this a single canister with the id of each user in a “Group”?
By the way, we also have a calendar event for the WG meetings. If you’d like to be added, feel free to DM me your email.
Here’s the recording.
Hi @abk, following the discussion in topic Subnets with heavy compute load I wanted to ask if schedule of the meetings is already marked in some calendar or if the actual date&time is just announced here prior the event. Thanks!
Hi @plsak, if you DM me an email I can add you to the calendar (or anyone else who’s interested)!
Thanks for the response! Done for me, trying to involve more - my reach is tiny and this is very technical and specific topic so I don’t expect much, but gave it a shot
Does anyone in this group also have an interest in joining the DeAI working group meetings? They occur on Thursdays at 6pm European time so there’s 30 minutes of overlap. What would people think about moving this meeting up 30 minutes so that it’s possible to attend both?
Ok, I’m taking the likes on the previous post as interest in moving the time up to avoid the AI meetings. Our next meeting will have @ielashi giving some demos on how to use canbench!
Thursday November 21st at 5:00 pm CET Zoom link
Hi everyone,
We’ll skip the meeting this week because a lot of people are out anyway, but we’ll be back in January. As a reminder, all previous talk recordings are available here.
Hi all,
This Thursday I’ll be giving an overview on the ICP design and explain some of the performance properties and tradeoffs. Please come with questions! Ideally this help improve our mental models and intuitions for ICP.
Thursday January 16th at 5:00 pm CET Zoom link
The recording from last week has been added.
@abk thank you for the presentation and recording, it’s very helpful
Wanna ask, is it possible to share also the slides for a quick future reference?
Thanks!
Great presentation @abk - a few questions after watching last week’s recording:
-
You mentioned that query throughput increases with the # of nodes in a subnet. However, in practice on mainnet are the boundary nodes aware of how the ingress load is distributed amongst nodes, and do they automatically then distribute the load evenly amongst nodes in the subnet? Or would the boundary nodes still route requests to the closest node regardless of load, meaning that in a scenario where requests are perfected geographically distributed throughput would increase with the number of nodes in a subnet, but realistically this wouldn’t make a difference if all the requests were coming from a single location, say New York City.
-
Queries have a “timeout” maximum of 5B instructions, or a single round of consensus, correct? Would this then prevent the state that a query a query operates on from decoupling from the most recently certified state by more than 1 round of consensus?
-
If there are multiple versions of a canister’s state being executed against, this doubles a canister’s sandbox memory footprint, correct? In the context of a simultaneous query and update call, this would mean a canister would occupy 2X memory in a node. Is there a case, such as involving a composite query that would allow a canister to occupy more than 2X sandbox memory at a point in time?
-
On the subject of wanting to process more than 4GB of memory in a certain message, it’s not that developers need to touch every bit of the memory loaded (except in the case of MapReduce/LLMs), but we’d want to load in, dynamically traverse, and then update various parts of a data structure (i.e. BTree) to find and then update specific values within that round of execution. If the issue is sandbox memory capacity, would growing the sandbox capacity to equal the max size of a max size of a subnet (1TB) solve this issue?
No problem, I’ve added the slides to the doc.