Complete: ICDevs.org Bounty #20 - QuickStart Dapp - Scaling With Canisters - 200 ICP, 100 ICP, 50 ICP - Multiple winners

No, please don’t replace anything with CandyLibrary. The native types exist for a reason.

There’s huge overhead with variant types, as we’ve discovered today in our project. I couldn’t think of anything worse than having a “variant type of all types”

What did you figure out? I thought it was like two bytes as long as you kept it under 256 entries?

And how else are you doing hierarchical data?(If you are)

I can’t tell you exactly because it’s a black box. We replaced variants with text fields and reduced our wasm function count by 600, from 5600 down to 5000.

It wouldn’t matter if there wasn’t a 6000 function cap on the wasm binary.

wasm-objdump and wasm-opt have helped us a lot

Good to know. Thanks. We’re doing a bunch of json and whatnot and nested documents that seem to require variants of some sort.

Okey, I will try it !

1 Like

It’s with great pleasure I present the fruits of weeks of labor: Scaled Storage

Introduction

Before starting this bounty, I knew I wanted to build a general solution that could be easily added into any rust project.

While working on the bounty, the following goals drove most decisions I made:

  1. Simplicity
  2. Ease of use
  3. Storage efficiency
  4. It should work as an imported library.

Scaled Storage is a generic distributed hash table tailored for the internet computer. It can scale to possibly infinite amount of canisters, with a worst case scenario of one inter-canister calls and usually a best case of zero (I’ll got into this later). I’ve currently tested it to 10 canisters. The client side never needs prior knowledge of all canisters holding data, but instead just the canister id of any one of the canisters.

Features

  1. Imported as a Rust library
  2. Scales up or down depending on developer defined behaviour.
  3. Quine” style canister replication. All canisters are functionally alike and use the same code.
  4. House keeping operations (migrations, scaling up and down) are abstracted away.
  5. There isn’t a “primary”, “index” or “secondary” canister, any request can be taken from any canister.
  6. Tries to reduce inter-canister calls.

Developer UX

  1. Developer imports the scaled_storage library
  2. Copy-pastes a few predefined scaled_storage operations.
  3. Uploads the canister’s WASM using ss_uploader.

How It Works
Scaled Storage uses a consistent hashing algorithm to map keys to their appropriate canisters.

When a key is accessed through the exposed functions, the result is either the value or the canister where it is located, In the latter case the request is simply forwarded to that canister. Here’s a code excerpt:

match canister.with_data_mut(key.clone(), |data| data.clone()) {
   NodeResult::NodeId(canister_id) => {
    //forward request to canister_id or return to client side
    }
   NodeResult::Result(result) => {
    //return result to client side
   }
}

Scaled Storage provides the following aptly named functions:

  1. with_data_mut
  2. with_upsert_data_mut

Upgrades & Migrations
When nodes are added or deleted, the changes are broadcast to all other nodes.

pub enum CanisterManagerEvent {
    NodeCreated(Principal),
    NodeDeleted(Principal),
    Migrate(MigrateArgs),
}

These nodes then update their node list and hash function. The affected values that need to be migrated are converted to a generic byte array and sent to the relevant node.

#[derive(CandidType, Deserialize, Debug, Clone)]
pub struct MigrateArgs {
    #[serde(with = "serde_bytes")]
    data: Vec<u8>,
}

///candid file
type migrate_args = record {
    data: blob;
};

The consistent hashing algorithm in question is Anchor Hash which has “high key lookup rates,and a low memory footprint” according to the authors of its paper.

This particular algorithm guarantees that given N number of nodes and K number of keys, the hash function distributes K/N number of keys to each node, achieving uniformity. It also reduces the amount of inter-canister calls necessary by:

  1. Only needing to migrate values to a newly created canister (instead of having to migrate to other nodes).
  2. Only needing to migrate values from a deleted canister without disrupting the location of other keys

Architecture

Scaled Storage can be described as a linked-list of nodes with each node having the following:

  1. The previous node id
  2. The next node id
  3. A consistent-hashing function mapping keys to a canister
  4. A LIFO list of all nodes.
  5. The underlying data

I chose this linked list structure over an “index node” → “secondary node” structure for the following reasons:

  1. Canisters/Nodes have exactly the same code and behaviour.
  2. Minimal Responsibilities - Each node is only responsible for its subsequent node and serves as its sole controller (with the developer being the controller of the first node). This reduces surface area for any accidents and gives each node a sole scape-goat.
  3. It reduces memory fragmentation - new canisters are only created after prior ones are full. Canisters cannot be removed in an arbitrary order, but must first have all subsequent nodes removed first.
  4. Reduces inter-canister calls since there’s no need to check the memory usage of all canisters, when the last canister wants to upgrade, it can simply assume all other canisters are full.

Issues and Solutions

  1. Large binary from in-lining wasm via include_bytes!:
    Wasm code for creating new canisters is added in heap memory at runtime via an init_wasm operation. I’ve created wasm_uploader for this.
  2. Payload size too large for init_wasm operation:
    The wasm binary is chunked and sent one at a time.
"init_wasm":(wasm_init_args)->(bool);

type wasm_init_args = record {
    position: nat8;
    wasm_chunk: blob;
};

Performance

  1. Update operations take at most two calls; one made by the end user and a second potential inter-canister call if the hash function points to another canister.
  2. Query operations take at most three calls; an additional call made by the end user.
  3. Since operations can be taken from any canister, operation results could contain the id of the canister and stored/cached on the front-end. Subsequent operations could then be directly requested using the stored canister id.

Tests and examples

  1. Scaled Storage Simple Scaling Test
    This demonstrates the scaling capabilities of scaled storage. The default test demonstrates 10 canisters storing simple string types.
  2. IC Snippets
    This is a more complex example, demonstrating storing more advanced data structures, and authentication to prevent unauthorised access:
         ...
        .with_upsert_data_mut(snippet_key.page_id(), |page| {
            match page.get_snippet(&snippet_key) {
                Some(snippet) => {
                    let owner = snippet.owner;
                    if owner != ic::caller() {
                        return Err("Auth Error");
                    }
               `  ...
                }
           ....
            }
        });

Future updates and issues:

  1. Consistent hashing algorithm change

Although Anchor hash prevents unnecessary data migrations, it distributes data equally amongst nodes, typically this is a desirable effect, but for Scaled Storage this means “filled” nodes still have data inserted in them (although the possibility of that reduces as the number of nodes increases).
A better hash function would have the following properties:

a. Given N nodes, only distributes keys to node N.
b. When number of nodes are increased from N to N+1, only keys in N have to be migrated to N+1.
c. When number of node are reduced from N to N-1, only keys in N are migrated to N-1

This would limit the number of canisters to be migrated to only one; either the prior node (downgrades), or the next node (upgrades).

  1. A rewrite implemented as a finite state machine: easier to read, more bug free code since state transitions can be validated at compile-time, easier to write rollback operations when state changes error.
  2. Macros to reduce the amount of boilerplate to write.
  3. Downscaling hasn’t been implemented.
  4. More operations on the underlying data apart from inserting and getting data. For example map reduce operations that require data from every node.
  5. Authenticate house-keeping requests between nodes. Currently it is trivial for a bad actor to pose as a newly created node, receiving migrated data.
  6. Query calls cannot be forwarded currently, the first demo’s query forwarding only works locally, pending inter-canister query calls. For other demos, the canister id is returned to the client.
  7. Cycles sent to one canister should be automatically distributed amongst all canisters.
  8. Scaled storage uses heartbeat, it may be possible to avoid this and instead “hitchhike” on other requests, reducing cycle consumption.

Summary
There’s still significant work to be done, but the foundation for a generic easy to use scaling solution has been created. Thanks for reading.

  1. Scaled Storage repo: Github
  2. Scaled Storage library crate: Crates
  3. Scaled Storage uploader crate: Crates
  4. Canister for the simple scaling test: qjikg-riaaa-aaaag-aaf7a-cai
7 Likes

Curious, was this wasm function count decrease with respect to variants observed strictly when writing your code directly in Motoko, or when writing your code in Go and then using that tool to compile it to wasm?

First off, I want to say that I absolutely love this scaling idea and what you’ve done with data partitioning, especially in the context of canister reliability. Very cool work, and I can tell you put a lot of hard work into designing and building this!

If one can distribute their data enough that one canister has the same management capacity as all canisters, then it becomes much harder to DDOS an entire application (you may just take one or more canisters offline).

Although I love the creativity of this idea, I see several potential issues with the scalability of the design in terms of the involvement of inter-canister update and/or query calls required for all the nodes to remain in sync. The fact that multiple canisters would constantly be rebalancing with each other (using inter-canister update calls) quite often means that when under significant load the entire application would most likely slow down and get “behind” on rebalancing.

In this situation imagine Canister A trying to update and re-balance with Canister C, Canister C simultaneously trying to re-balance with Canister B, and Canister B is trying to re-balance with Canister A. We end up with inter-canister update call 2+ second “locks” that lock up the rest of re-balancing efforts.

In a large enough application with enough of these requests coming in, I could see this eventually leading to application failure.

A few questions.

  1. When the number of nodes is increased from N to N+1, how long does the complete re-paritioning take? How does this change as we go from N=3 to N=50?

  2. If you have N=10 canister nodes and you hit each canister with the exact same ~500 requests in a second, how long does it take to re-balance? What about ~1000 requests/sec?

  3. Does Anchor Hashing allow me any control over knowing the exact canister node/partition that my data was sent to? For example, can I easily ensure that all data for a specific user is in a specific canister, or do I need to spin up a new scaled storage actor for each user in order to do this?

  4. How is the client expected to know of and randomly pick which canister to interact with at any one time given that the management canister generates random canister ids?

1 Like

We were at about 5800 wasm functions (not using any optimisers at that point). We had about 17 entities that used variants, ie…

#common; #uncommon; #rare

I changed the go generator to skip variants and just reference them by ID (like a normal database relation), then recompiled and we were at 5200.

Going through the wasm-objdump logs and grepping for #common returned way more lines than I would have expected. It was at about 40 per variant, which makes sense if we dropped about 600 funcs.

2 Likes

Hi, thanks a lot for the praise.
About re-balancing, only the last canister has sole responsibility concerning upgrades or downgrades. So it’s impossible for an upgrade and downgrade to occur simultaneously. Also migrations (or re-balancing if I understood you correctly) only occur two ways: To the last node and from the last node, no other canister is affected in the re-balancing operation.

Answers

  1. This needs benchmarks, i can’t really say as it depends on the size of the values stored: a simple string contains far less bytes than a struct or a hashmap. Also there can be optimisation made to make sure the data migrated is chunked at little as possible; reducing the time it takes to complete a migration.
  2. Rebalancing should be very rare; it happens only when a canister is filled up or is empty. Also since it’s developer defined behaviour, you can only benchmark this with an actual concrete implementation.
    For example it’s possible to define that the canister should be considered filled up when it has only 2GB memory remaining and empty when it has 1.9GB; this is terrible logic and nobody in their right mind would do that, but alas with great power comes…Anyways you can imagine if there are a lot of creation and deletion of keys, canisters would be created and deleted wildly but again it is the responsibility of the developer using the library to use good upgrade/downgrade logic.
  3. Yes, you simply call a function with the data’s key and you get the exact canister id it is located (happens in nanoseconds).
  4. It’s not up to the client to know anything. Ideally the system looks like one canister to the client.
    To explain further, a client just needs to know one canister id, and update operations work seamlessly since the canister that the client knows, would forward the request automatically to appropriate canister.
    For query requests though, due to to query calls not being supported on the IC yet, requests can’t be forwarded and instead the appropriate canister id is returned to the client for another query request.

A good practice would be to return the canister id of both query and update requests (Check out the second demo on how I did it). This canister id can be cached in localstorage and called later:

  1. If the cached canister id points to a now deleted canister, the client can simply call the first canister (Since it is guaranteed to always exist).
  2. Even if the data has since been migrated to another canister, its new canister id is returned.
2 Likes

This looks fantastic. Amazing job!

1 Like

Update for the description of the second example (ic_snippets).
The goal was to be able to serve paginated data which is somewhat difficult given we’re storing data on what is basically a key value store. Here’s how snippets are stored:

Each page has a max number of snippets before a new page is created. So now the client can request for a page and request for prior or subsequent pages on demand

1 Like

@simdi.jinkins Thanks for correcting my misunderstanding of how and when re-balancing happens (when node “full” threshold is hit) and taking the time to answer my questions.

I hope you don’t mind me asking few follow-ups :sweat_smile:

(Follow up Q1)

I’m still not quite clear on the details of how when a new canister is spun up (we increase # of nodes from N → N + 1 → N + 2), how the client becomes aware not just of how many canister nodes exist and the canister node partition to send the request to, but the specific canisterId corresponding to that node.

When the management canister creates a new canister parition node, it generates a completely random canisterId along with that node, one where the application developer cannot fix, control, or predict what the canisterId will be (I wish we could do this!).

How would a client then be able to guess this random canisterId when making requests if it has no control over the canisterId generation? There must be some “fixed” initial canisterId representing Canister A in the client application that the client can first use to make contact with the data storage solution, correct?

I see this as your solution to the issue I mentioned above. The client is told a single canister that they will interact with, and they use this initial canister to then get all information about the current array of canister ids and the current hash ring that they can then use AnchorHash algorithm with on the client side. If a re-balancing happens during the middle of a client session, the client needs to re-fetch the new array canister ids and the new hash ring.

However, this approach means that all clients are still originally directed to the same initial canister for fetching this cluster data, and that each application client will have a hardcoded canisterId that they will use, since they can’t predict the canisterIds of dynamically created canister nodes.

Is this a correct assumption?



(Follow-up Q2)

About a month ago, I had these exact same questions about data transfer speeds (inter-canister, ingress, egress, etc.), and the question got pretty much ghosted and went unanswered :smile:

Whether no one has been brave enough to test this out or just hasn’t made their findings public, there are multiple anecdotal comments from developers who have mentioned that downloading 1-2 MB of data from the IC can take a few seconds in this thread.

By the way, consistent hashing is pretty cool - I didn’t know much about it before reading your solution → this article has a pretty neat breakdown with visualizations on how one might naively implement consistent hashing and how it works in general.

Looking into consistent hashing, it looks like for each new canister node being spun up the rebalancing cost is O(K/N), where K = the number of keys and N = the number of nodes.

This would mean that as you scale up, for each new node added the performance cost of data transfer redistribution remains constant.



One additional question (new)

  1. Thinking about consistent hashing and the way that it further subdivides the hash ring as it scales, how does consistent hashing handle unbalanced data coming in, or more specifically how does the AnchorHash algorithm handle it? Let’s say I’m seeding my scalable storage database and all of my keys are inserted in monotonically increasing order. How would that affect this architecture?
3 Likes

I see this as your solution to the issue I mentioned above. The client is told a single canister that they will interact with, and they use this initial canister to then get all information about the current array of canister ids and the current hash ring that they can then use AnchorHash algorithm with on the client side.

Not at all, the client doesn’t have to know anything about the number of canisters or anything apart from the initial canister id. The hash ring is only implemented on the canisters.

If a re-balancing happens during the middle of a client session, the client needs to re-fetch the new array canister ids and the new hash ring.

Hmm there are two cases for a re-balancing occurring:

  1. Re-balance occurs before the client sends a request: It doesn’t matter, the request would be forwarded to the appropriate canister (for update calls) or the appropriate canister’s id is returned (for query calls). The whole re-balancing operation is completely transparent to the client.
  2. Re-balancing is occurring while the client is sending a request: It shouldn’t matter, but I need to figure out a way to test this. For example it’s possible a client might send a request during a downgrade operation to a soon to be removed node during the very brief period between the migration being completed and the hash ring being updated. But this can be made a non-issue by simply refusing requests and asking the client to wait during a node migration/rebalancing.

Thanks for bringing this up.

However, this approach means that all clients are still originally directed to the same initial canister for fetching this cluster data, and that each application client will have a hardcoded canisterId that they will use, since they can’t predict the canister Ids of dynamically created canister nodes.

I think you may be forgetting that, all the nodes get their hash-ring updated after any re-balancing. So any request from the client to any node still gets sent to the appropriate canister id.

  1. Thinking about consistent hashing and the way that it further subdivides the hash ring as it scales, how does consistent hashing handle unbalanced data coming in, or more specifically how does the AnchorHash algorithm handle it? Let’s say I’m seeding my scalable storage database and all of my keys are inserted in monotonically increasing order. How would that affect this architecture?

Hmm, I’m not sure if I interpreted your question correctly, but if you’re asking how the keys are distributed then the answer is evenly, for one node all the keys map to that node, for two 50% map to a single node, for three 33%…

Did I get that correct ?

I think the question is that if it 50/50 on two and all keys are increasing ints(say a time stamp), do they all end up going to the same canister?

Most likely not, key distribution is unpredictable. If you want to group keys together in the same canister you would need to use a higher level data structure that stores multiple keys instead of using the keys themselves.

Does the library support certification? It would be interesting to see how one would go about splitting and distributing merkle trees.

i’ve got a lot to learn about how certified data works but there is some literature on hash table alternatives to meckle trees

This Bounty is now Closed! Results soon!

1 Like

Hi, is anyone aware of which one(s) of these five winning projects made it into production as is?