Complete: ICDevs.org Bounty #20 - QuickStart Dapp - Scaling With Canisters - 200 ICP, 100 ICP, 50 ICP - Multiple winners

Hmm, that’s an awesome repo, but it seems to be missing a license?

Perhaps @hpeebles can help with that?

Also, I salute the first rust bounty from ICDevs! May there be many more in the future :slight_smile:

1 Like

Feel free to check it https://dft.delandlabs.com/AutoScalingStorage

1 Like

The Developer Grant Program is currently open to any and all participants who meet the eligibility requirements (which can be found in the Submittable application) and pass KYC.

4 Likes

Sorry to clarify, are you asking for a solution similar to the currently defunct bigmap ?

That would certainly qualify! We are targeting apps as opposed to utilities, but an app that used a generalized solution would certainly be welcome!

1 Like

Hello,
I have created an application for the Bounty and want to apply.
Here is the link to Github: GitHub - hoosan/auto-scaling-notes
I would be happy to receive your feedback.

2 Likes

Awesome. I will review it! Do you have a demo deployed anywhere?

Yes, the app is deployed here: https://yflxa-iaaaa-aaaai-acfja-cai.ic0.app

2 Likes

Hi, when’s the deadline ? Still working on my dht style solution.

2 Likes

We will likely keep this open for a while, but the sooner you get us a solution the better. Will likely shut things down after we have six or so submissions.

Hey @skilesare , great timing with this bounty. I had “research canister scaling” as a grant application todo, so I took a stab at it. Here’s a brief description of the project. There’s also a screen cap with the front-end, demoing the main concepts of the app.

Architecture: I went for the full scaling solution, where clients upload content to many buckets. The buckets create indexes based on the content they received and periodically (5s for the demo - but configurable) send the index to an Indexing canister. The Index canister instructs the front-end to upload new content to particular canisters based on the indexing strategy (more below). The Index canister also serves an index of canister IDs to the front-end, based on the indexes received from the Buckets. Thus the front-end first requests an “index” for #dogs, and then queries all the canisters where #dogs content was recorded, and displays the list.

To demo “user access” I chose a trivial top-to-bottom approach. “Moderators” are added to the Index canister (based on a principal) and the Index sends the “moderator list” to every Bucket. By default the Buckets only serve content created by the requester, or by Anonymous. If a moderator is added, however, they can view every piece of content uploaded to that Bucket.

I’ve used “entries” here, to denote pieces of content. This could be anything that we can track and quantize on a bucket. It could be megabytes for file storage, live sessions for proxy canisters, users served for game realms, etc. The 20 entry limit was chosen just so we can see the spawning in a live environment.

Implementation: The study is written in rust, with a react mock-up for front-end.

The canisters rely on heartbeat() functionality, as I wanted to also test this at scale. There are a ton of ways to optimize the flows, and the architecture could probably support having heartbeat() functionality just on the Index canister. Or, better yet, a dedicated heartbeat-enabled canister for the entire project.

Code organization: I tried to be as non-opinionated as possible. There are a lot of great production-ready projects out there where one could chose to get inspiration on file organization. I chose to keep it as simple as possible, so that people reading the code focus on the IC stuff and not on implementation details. A lot of things could obviously be optimized and better organized.

Both the Index and Bucket canisters have 4 main source-files. lib.rs deals with canister settings, and IC-related calls (queries and updates);
businesslogic.rs deals with … business logic. Here lies the main impl for most of the functionality;
env.rs is an adaptation from @hpeebles’ starting-project and deals with helpers for cdk API;
lifetime.rs deals with pre and post upgrades and the heartbeat function.

The front-end is simply thrown together to demo the canister workflows, nothing to write home about.

Key things to note when playing with the demo (some of the things can be hopefully seen in the video below):

  1. There are two indexing strategies implemented - FillFirst and BalancedLoad. The default one is Balanced. The front-end first requests a list of can_id where to upload content, and then calls the first canister in the list. We can imagine the Index canister using a multitude of indexing strategies, based on business needs. (an interesting one would be grouping content to optimize for querying as few canisters as possible on content display)

  2. The Index canister maintains a number of metrics. The main thing to notice is the relation between Free Slots, Desired Free Slots and Planned Slots. Free slots are computed every 5 seconds. When Free Slots becomes lower than the Desired number, a new bucket is planned for and added to the spawn queue. We add the planned slots, so that we don’t over-add too many canisters if the spawning process takes ~4-5 seconds and the heartbeat() gets called multiple times.

  3. When sending multiple tags in a short time, notice that even though the Indexing Strategy wants to add content to the lowest canister, the re-indexing happens once every 5 seconds. So multiple entries will be sent to the same canister. (this turns out not to be a problem in larger deployments).

  4. Adding a moderator to the Index canister will get populated to all the Buckets on the next heartbeat update, as this would probably need to be as close to synchronous as possible, since a real-life implementation would also require deleting moderators, and this approach would prevent weird edge-cases where moderators could still edit content on slow-to-update Buckets.

Play around with the demo, and let me know if there are any questions.

Note: please do not use this code as is in production!!! This was approached as a scalability study, and it is not optimized, and not tested well enough to be production ready. Feel free to use the code as you want, but please make sure it’s tested and stable before using this with real stakes!

Repo:

Video Demo:

7 Likes

Great job! Thanks for the research, explanation, great video, and well organized project and code. If it was up to me I’d give you the grand prize :rofl: (pokes @skilesare)

This gives me a bunch of new ideas I’d like to test out :slight_smile:

A few follow up questions:

\1. I see you’re storing a HashMap<hashtag, List<canisterId>> in the index canister, and that the frontend is making two rounds of calls - one to query index canister to retrieve the list of storage canisters that contain a particular hashtag, and then one to query all of the storage canisters containing that specific hashtag. Did you explore any ideas of how you might scale out the index canister if the HashMap fills up?

I could see potentially having that “HeartBeat” canister you mentioned earlier turn into a management canister that holds your slot metrics and can spin up new index canisters as a second level of indirection, but now you’re scaling by putting additional levels of indirection in front of the user. If you take this route you now have 3 rounds of front-end calls - one to the management canister holding the index canisters, one to all the index canisters asking for canisters with your hashtag, and then finally to the storage canisters. For now, this multiple query approach makes a lot of sense since inter-canister queries are blocked and inter-canister updates are very slow.

Wonder if you were able to ponder on this point of scaling out the index canister itself, potentially without adding additional levels of indirection.

\2. The tradeoff made to achieve scalable key value storage sacrifices sortability and some query depth (say getting the latest entries - sorted by timestamp, or getting all cat hashtags geotagged in a particular region). This also comes into play when I want to update a specific post that I’ve given the #cat hashtag, or someone else tries to like my post - with no unique identifier, which cat post gets updated?

Did you give any thought to how one might auto scale while still providing sortability and additional query depth?

1 Like

Thank you for the kind words :blush:

For the first question, that’s literally on my whiteboard right now :slight_smile: I have two diagrams, one with an index of indexes and one with simply two index canisters. There are a couple of benefits of simply having two Index canisters, maybe sorted by “frequently used stuff”. I believe that 8gb ought to be enough (famous last words), and worst case scenario sometimes the clients will query both index canisters (instead of always making two queries). I guess we’ll have to grow and see…

Your second question is a bit more complicated, and I don’t have a good answer at this point. There are a lot of factors going in how you decide to index your data, and obviously there are some tradeoffs, one way or the other. My hope for my project is that it’s not a critically real-time system, thus I have the benefit of being able to schedule tasks and as long as they “eventually” complete, the system should still work. In other words, I might be able to have many indexing tasks work on each Bucket, and the main Index would just point towards them. It’s a kind of both distributed storage and distributed computing. The tradeoff is of course the front-end needs to do more queries. Looking at existing projects, that seems to be a good tradeoff to make, as the IC seems uniquely positioned to serve this use case with ease. When in doubt, we’ll “simply” add another layer of caching, trading space for compute time.

1 Like

Hey, as promised I updated my submission with docs, some diagrams, and cleaner code and an instructional tutorial. I will remove the old reply. Thanks! I hope its not too late

1 Like

The summary is from Github
The purpose of this project is to create canisters that have shared ownership. However, this is different than allowing multiple canisters to have control of the canister in the default sense. The problem is some default canister functions give full access to anyone with control (ie. there is only one level of control). A quick analogy is in a joint bank account from bank, you can have a married couple with each of their name on it but each party is able to completely withdraw funds. In a divorce, things can become messy if one party decides to withdraw everything quickly since both parties have full control. Instead, it would be nice to create a joint bank account where maybe people can vote on money spent, split evenly the amounts, divy up the percentage owned etc. In essence if the default settings for multiple users with control over a canister is like a legacy joint bank account,this program is meant to create extra sharing functionality for canisters!

I achieve this by creating a priary canister that creates canisters. This canister retains default control of the new canisters and any customized ownership is kept track in a list in the secondary canisters. Users interface with the secondary canisters through method calls from the the primary one. Essentially, the primary canister has an assoclist (dictionary) that has the users principal ids as a key and a buffer of secondary ids per principal id. This achieves project goal 1.

Primary canister provides indexing information such that a client can distribute prallel calls across secondary canisters directly.

principalids/users can join and unjoin secondary canisters as they please. (However, custom membership, limiting number of principal ids/users, etc. can be added). For now it is just join and unjoin. However, actual calling of canisters isnt from the the principalids/users themselves but from the primary canister. If we remove all control from the primary canister after enough development, the system can be quite trustless as default control would be essentially “black holed”. This achieves project goal 2.

Provide a security interface such that secondary canisters can hold private data from many users but only deliver requests to authorized requesters. Attempt to use as few inter-canister calls as possible.

TO SEE CODES STRUCTURE, SEE DIAGRAMS AT BOTTM OF THIS PAGE. Essentially, main.mo is the primary canister, and it creates many instances of NodeCanisters (NodeCanisters.mo). These live in /src/Main directory.

4 Likes

@skilesare , any updates on this and the other bounties for the hackathon?

1 Like

Office hours tomorrow.Hopefully an update today.

2 Likes

@skilesare

Technical features of ICSP:

  1. Infinite Capacity ICSP Canister: read and write to the same Canister without having to worry about storage space.
  • Explanation:
    • infinite capacity refers to the infinite creation of the Canister contract (in the case of a sufficient Cycle) , which supports the automatic creation of the Canister when the storage Canister is full and does not block the creation of data writes, smooth and smooth switching of storage destination.
  1. Support CRUD(only support read and write at current version)
  • Explanation:
    • in the business, supports the data to add, delete and check four operations, and in the related operations on the memory of the appropriate optimization, to support the reuse of fragmented memory(Next Version).
  1. One Step Store Data and Two Steps Get Data
  • Explanation:
    • One-step storage: support back-end direct: ignore await store the key and value into ICSP Canister. When store (key, value) , do not have to wait for the return value, which creates the convenience of storage.
    • Two-step get : first obtain from the ISP which storage unit is stored in, and then obtain metadata from that storage unit(bucket).
  1. Cycle automatic monitoring
  • Explanation:
    • ICSP Heartbeat actively triggers the monitoring of ICSP and storage monomer Cycle balances and automatically Top Up, so the user only needs to monitor the ISP’s ICP balances.

Welcome to check out !

7 Likes

Awesome and pure key/value store! I’d love to see you replace blob with CandyLibrary so you can store all kinds of data!

1 Like

No, please don’t replace anything with CandyLibrary. The native types exist for a reason.

There’s huge overhead with variant types, as we’ve discovered today in our project. I couldn’t think of anything worse than having a “variant type of all types”