Scaled Storage is a generic distributed hash table tailored for the internet computer. It can scale to possibly an infinite amount of canisters, with a worst-case scenario of one inter-canister call and usually a best case of zero. The client side never needs prior knowledge of all canisters holding data, but instead just the canister id of any one of the canisters. - The repo: GitHub - scroobius-pip/scaled_storage
Congrats to all the authors. Please use the below thread to promote your project and if you plan to do any more with it.
Thank you, @skilesare & @dfinity for making this happen! I don’t have any plans for continuing the first quickstart, but I do have a prototype for the second bounty that I hope to present later this week.
I think that having open quickstart / bounties like these is great, as it gives an opportunity to pick & choose what areas you’d like to focus on, with some added benefit of potential rewards. Hope to see many more from both ICDevs and Dfinity!
This is awesome, thanks to everyone who contributed!
I hate to be that person, but I’m having trouble understanding the similarities and differences between these 5 solutions. They all seem to be using the same approach with one primary canister + multiple secondary (bucket) canisters.
However, I’m sure there are many details that matter.
For example, I’m curious how these solutions handle:
cycles with secondary canisters
upgrading secondary canisters
what happens when the primary canister runs out of memory
Can some of the authors comment on this and some of the advantages and drawbacks of their scaling solution? This is a really important piece of technology that will be foundational in the new Internet, and I’d like there to be a bit more transparency and dialogue on what is being presented here. Thanks!
This would be great discussion to have! I’m also interested in what is on your “I’ll deal with that later” list. There is probably lots of things that we could learn from and problems we could solve with other bounties. I be there was a lot of overlap.
You may find more detailed discussions on each of the designs here
Many of the designs are actually quite different - there are a few diagrams in the posts as well that layout how the multi-canister communication and flow of information is set up.
Great questions! Here’s a couple of quick thoughts:
-cycles would be easy to handle from the Indexing canister. Each bucket has a drain_cycles fn that sends the excess cycles to the Index, so that’s one direction. Covering the other classical direction from Index to Buckets could be done either via a c2c check, or through the management canister. Since the Index canister is a controller on each Bucket canister, the Index can loop through and query the management canister for cycle balances on all it’s Buckets.
-upgrading buckets: that’s something that I haven’t covered at all, and it would be awkward to do in the quick & dirty way I dealt with loading the wasm for Buckets (e.g. it’s copied into the Index wasm, not the best choice in hindsight). I’d go for another approach if upgrading buckets would be needed. (that is kind of the reason I also pass a “hardcoded” dfx principal to each Bucket, on creation. If I also have a dfx identity as controller, buckets could be updated with code from a utility tool.
-OOM Index - I touched a bit about this in the dedicated topic - I’m not sure what is the best solution, and I have a hunch that it would depend a lot on what the business logic dictates. It is possible that a reasonable usage dapp to never hit the indexing limit of 4-8gb. It’s also possible that if they hit it, it stays relatively close (say 2x or 3x Indexing canisters). In that case, one could simply decide to hardcode the indexing id’s in the frontend, and “ask” all of them sequentially, if the first one doesn’t have an answer. I think the “easy” way out would be to use an index of indexes, but then every client does at least 2 indexing calls, while in the hardcoded version some would only make one. Some unlucky ones would make more than 2.