ICDevs.org Bounty #20 - QuickStart Dapp - Scaling With Canisters - 200 ICP, 100 ICP, 50 ICP - Multiple winners

ICDevs.org is very excited to announce a new initiative brought to you by the DFINITY Foundation and ICDevs.org. The QuickStart Dapp series will create a competition for developers to produce a number of sample dapps. Each Bounty will be open for at least two weeks and we will award prizes to submissions as they come in and are evaluated. There may be multiple awards based on the size/number of submissions.

The goal of this series is to produce a sizable set of sample content for the upcoming SuperNova hackathon.

QuickStart Dapp - Scaling With Canisters - #20

Current Status: Discussion

  • Open for submission - (03/28/2022)

  • Closed

Official Link

Bounty Details

  • Bounty Amount: 200 ICP First Prize, 100 ICP Second Prize, 50 ICP Third Prize

  • Project Type: Single Contributor/Team

  • Opened: 03/28/2022

  • Time Commitment: Weeks

  • Project Type: Sample App

  • Experience Type: Intermediate - Motoko; Intermediate - Rust; Intermediate - Web

Description

This bounty gives the opportunity to

  • learn motoko

  • learn rust

  • learn how scaling works

  • learn how to use canisters to create canisters

  • learn about indexing

  • learn how clients access the Internet Computer

The goal of this bounty is to produce a sample application on the Internet Computer.

Goal: Demonstrate scalability by using inter-canister calls

Create a practical dapp that has a common endpoint (primary canister) that can scale its application by creating secondary canisters and distributing requests across those canisters.

Reach Goal 1: The primary canister provides indexing information such that a client can distribute parallel calls across secondary canisters directly.

Reach Goal 2: Provide a security interface such that secondary canisters can hold private data from many users but only deliver requests to authorized requesters. Attempt to use as few inter-canister calls as possible.

Your application can be written in either motoko or rust. Further, a motoko and rust version can be submitted as seperate entries by the same person/team.

The code must be opensourced using the MIT License.

To submit for this bounty you should:

Create a github repo with your sample application and post the link to either the (dev forum post) or the (ICDevs.org dscvr portal)[DSCVR].

We will start selecting prize winners by April 12th, 2022. Submission will stay open until we believe we have a sufficient number of sample applications. Multiple prizes may be awarded for submissions that reach a sufficient level of completeness.

Bounty Completion

Once your app is complete and submitted, it will be judged on the following criteria:

  • How relevant is this sample dapp for the community?

  • How well is the sample dapp’s functionality presented?

  • Does this sample dapp help me to build enough? Can I use the sample dapp for a real project?

  • How well was the sample dapp written?

  • How many goals were reached?

Bonus considerations:

  • Are there tests?

  • Is the documentation provided (readme file on github) sufficient?

  • A user interface of some kind is highly encouraged so that users of your sample application can get a visual view of how your application works.

Funding

The bounty was generously funded by the DFINITY Foundation. Additional donations that fund the administration of these bounties can be sent to ICDevs.org. All donations will be tax deductible for US Citizens and Corporations. If you send a donation and need a donation receipt, please email the hash of your donation transaction, physical address, and name to [email protected]. More information about how you can contribute can be found at our donations page.

Other ICDevs.org Bounties

15 Likes

This is awesome and about time.

This might be helpful to whomever does this: GitHub - open-ic/open-storage

1 Like

Hmm, that’s an awesome repo, but it seems to be missing a license?

Perhaps @hpeebles can help with that?

Also, I salute the first rust bounty from ICDevs! May there be many more in the future :slight_smile:

1 Like

Feel free to check it https://dft.delandlabs.com/AutoScalingStorage

1 Like

The Developer Grant Program is currently open to any and all participants who meet the eligibility requirements (which can be found in the Submittable application) and pass KYC.

4 Likes

Sorry to clarify, are you asking for a solution similar to the currently defunct bigmap ?

That would certainly qualify! We are targeting apps as opposed to utilities, but an app that used a generalized solution would certainly be welcome!

1 Like

Hello,
I have created an application for the Bounty and want to apply.
Here is the link to Github: GitHub - hoosan/auto-scaling-notes
I would be happy to receive your feedback.

2 Likes

Awesome. I will review it! Do you have a demo deployed anywhere?

Yes, the app is deployed here: https://yflxa-iaaaa-aaaai-acfja-cai.ic0.app

1 Like

Hi, when’s the deadline ? Still working on my dht style solution.

1 Like

We will likely keep this open for a while, but the sooner you get us a solution the better. Will likely shut things down after we have six or so submissions.

Hey @skilesare , great timing with this bounty. I had “research canister scaling” as a grant application todo, so I took a stab at it. Here’s a brief description of the project. There’s also a screen cap with the front-end, demoing the main concepts of the app.

Architecture: I went for the full scaling solution, where clients upload content to many buckets. The buckets create indexes based on the content they received and periodically (5s for the demo - but configurable) send the index to an Indexing canister. The Index canister instructs the front-end to upload new content to particular canisters based on the indexing strategy (more below). The Index canister also serves an index of canister IDs to the front-end, based on the indexes received from the Buckets. Thus the front-end first requests an “index” for #dogs, and then queries all the canisters where #dogs content was recorded, and displays the list.

To demo “user access” I chose a trivial top-to-bottom approach. “Moderators” are added to the Index canister (based on a principal) and the Index sends the “moderator list” to every Bucket. By default the Buckets only serve content created by the requester, or by Anonymous. If a moderator is added, however, they can view every piece of content uploaded to that Bucket.

I’ve used “entries” here, to denote pieces of content. This could be anything that we can track and quantize on a bucket. It could be megabytes for file storage, live sessions for proxy canisters, users served for game realms, etc. The 20 entry limit was chosen just so we can see the spawning in a live environment.

Implementation: The study is written in rust, with a react mock-up for front-end.

The canisters rely on heartbeat() functionality, as I wanted to also test this at scale. There are a ton of ways to optimize the flows, and the architecture could probably support having heartbeat() functionality just on the Index canister. Or, better yet, a dedicated heartbeat-enabled canister for the entire project.

Code organization: I tried to be as non-opinionated as possible. There are a lot of great production-ready projects out there where one could chose to get inspiration on file organization. I chose to keep it as simple as possible, so that people reading the code focus on the IC stuff and not on implementation details. A lot of things could obviously be optimized and better organized.

Both the Index and Bucket canisters have 4 main source-files. lib.rs deals with canister settings, and IC-related calls (queries and updates);
businesslogic.rs deals with … business logic. Here lies the main impl for most of the functionality;
env.rs is an adaptation from @hpeebles’ starting-project and deals with helpers for cdk API;
lifetime.rs deals with pre and post upgrades and the heartbeat function.

The front-end is simply thrown together to demo the canister workflows, nothing to write home about.

Key things to note when playing with the demo (some of the things can be hopefully seen in the video below):

  1. There are two indexing strategies implemented - FillFirst and BalancedLoad. The default one is Balanced. The front-end first requests a list of can_id where to upload content, and then calls the first canister in the list. We can imagine the Index canister using a multitude of indexing strategies, based on business needs. (an interesting one would be grouping content to optimize for querying as few canisters as possible on content display)

  2. The Index canister maintains a number of metrics. The main thing to notice is the relation between Free Slots, Desired Free Slots and Planned Slots. Free slots are computed every 5 seconds. When Free Slots becomes lower than the Desired number, a new bucket is planned for and added to the spawn queue. We add the planned slots, so that we don’t over-add too many canisters if the spawning process takes ~4-5 seconds and the heartbeat() gets called multiple times.

  3. When sending multiple tags in a short time, notice that even though the Indexing Strategy wants to add content to the lowest canister, the re-indexing happens once every 5 seconds. So multiple entries will be sent to the same canister. (this turns out not to be a problem in larger deployments).

  4. Adding a moderator to the Index canister will get populated to all the Buckets on the next heartbeat update, as this would probably need to be as close to synchronous as possible, since a real-life implementation would also require deleting moderators, and this approach would prevent weird edge-cases where moderators could still edit content on slow-to-update Buckets.

Play around with the demo, and let me know if there are any questions.

Note: please do not use this code as is in production!!! This was approached as a scalability study, and it is not optimized, and not tested well enough to be production ready. Feel free to use the code as you want, but please make sure it’s tested and stable before using this with real stakes!

Repo:

Video Demo:

7 Likes

Great job! Thanks for the research, explanation, great video, and well organized project and code. If it was up to me I’d give you the grand prize :rofl: (pokes @skilesare)

This gives me a bunch of new ideas I’d like to test out :slight_smile:

A few follow up questions:

\1. I see you’re storing a HashMap<hashtag, List<canisterId>> in the index canister, and that the frontend is making two rounds of calls - one to query index canister to retrieve the list of storage canisters that contain a particular hashtag, and then one to query all of the storage canisters containing that specific hashtag. Did you explore any ideas of how you might scale out the index canister if the HashMap fills up?

I could see potentially having that “HeartBeat” canister you mentioned earlier turn into a management canister that holds your slot metrics and can spin up new index canisters as a second level of indirection, but now you’re scaling by putting additional levels of indirection in front of the user. If you take this route you now have 3 rounds of front-end calls - one to the management canister holding the index canisters, one to all the index canisters asking for canisters with your hashtag, and then finally to the storage canisters. For now, this multiple query approach makes a lot of sense since inter-canister queries are blocked and inter-canister updates are very slow.

Wonder if you were able to ponder on this point of scaling out the index canister itself, potentially without adding additional levels of indirection.

\2. The tradeoff made to achieve scalable key value storage sacrifices sortability and some query depth (say getting the latest entries - sorted by timestamp, or getting all cat hashtags geotagged in a particular region). This also comes into play when I want to update a specific post that I’ve given the #cat hashtag, or someone else tries to like my post - with no unique identifier, which cat post gets updated?

Did you give any thought to how one might auto scale while still providing sortability and additional query depth?

1 Like

Thank you for the kind words :blush:

For the first question, that’s literally on my whiteboard right now :slight_smile: I have two diagrams, one with an index of indexes and one with simply two index canisters. There are a couple of benefits of simply having two Index canisters, maybe sorted by “frequently used stuff”. I believe that 8gb ought to be enough (famous last words), and worst case scenario sometimes the clients will query both index canisters (instead of always making two queries). I guess we’ll have to grow and see…

Your second question is a bit more complicated, and I don’t have a good answer at this point. There are a lot of factors going in how you decide to index your data, and obviously there are some tradeoffs, one way or the other. My hope for my project is that it’s not a critically real-time system, thus I have the benefit of being able to schedule tasks and as long as they “eventually” complete, the system should still work. In other words, I might be able to have many indexing tasks work on each Bucket, and the main Index would just point towards them. It’s a kind of both distributed storage and distributed computing. The tradeoff is of course the front-end needs to do more queries. Looking at existing projects, that seems to be a good tradeoff to make, as the IC seems uniquely positioned to serve this use case with ease. When in doubt, we’ll “simply” add another layer of caching, trading space for compute time.

1 Like

Hey, as promised I updated my submission with docs, some diagrams, and cleaner code and an instructional tutorial. I will remove the old reply. Thanks! I hope its not too late

1 Like

The summary is from Github
The purpose of this project is to create canisters that have shared ownership. However, this is different than allowing multiple canisters to have control of the canister in the default sense. The problem is some default canister functions give full access to anyone with control (ie. there is only one level of control). A quick analogy is in a joint bank account from bank, you can have a married couple with each of their name on it but each party is able to completely withdraw funds. In a divorce, things can become messy if one party decides to withdraw everything quickly since both parties have full control. Instead, it would be nice to create a joint bank account where maybe people can vote on money spent, split evenly the amounts, divy up the percentage owned etc. In essence if the default settings for multiple users with control over a canister is like a legacy joint bank account,this program is meant to create extra sharing functionality for canisters!

I achieve this by creating a priary canister that creates canisters. This canister retains default control of the new canisters and any customized ownership is kept track in a list in the secondary canisters. Users interface with the secondary canisters through method calls from the the primary one. Essentially, the primary canister has an assoclist (dictionary) that has the users principal ids as a key and a buffer of secondary ids per principal id. This achieves project goal 1.

Primary canister provides indexing information such that a client can distribute prallel calls across secondary canisters directly.

principalids/users can join and unjoin secondary canisters as they please. (However, custom membership, limiting number of principal ids/users, etc. can be added). For now it is just join and unjoin. However, actual calling of canisters isnt from the the principalids/users themselves but from the primary canister. If we remove all control from the primary canister after enough development, the system can be quite trustless as default control would be essentially “black holed”. This achieves project goal 2.

Provide a security interface such that secondary canisters can hold private data from many users but only deliver requests to authorized requesters. Attempt to use as few inter-canister calls as possible.

TO SEE CODES STRUCTURE, SEE DIAGRAMS AT BOTTM OF THIS PAGE. Essentially, main.mo is the primary canister, and it creates many instances of NodeCanisters (NodeCanisters.mo). These live in /src/Main directory.

3 Likes

@skilesare , any updates on this and the other bounties for the hackathon?

Office hours tomorrow.Hopefully an update today.

1 Like