DSocial is growing quite fast, I’m getting a little concerned with my architecture. Need your support.
So DSocial is currently one canister with everything inside it, videos (title/desc/url to video), comments, likes, user history, users, everything.
With Motoko, how do we build for large datasets? I feel like this should really be abstracted from the smart contract, but maybe I’m wrong. Do I need to manually create an infinite number of canisters to scale? What is the recommended approach by DFINITY?
In TradTech world the popular most scalable and maintanable way is the microservices architecture (compared to one huge monolithic spaghetti). There’s plenty of material about that on the Internet. I’m pretty sure the same principle could and should be considered here.
That is, separate functionality into logical canisters. There is no “one way” to do this. It is subjective. But for example, you could separate video data, user data, threads/ comments separately. Not just data but backend logic as well. It would be cleaner to develop, test, maintain and be more scalable.
Think of it this way. A canister is an async ACTOR (actor model framework - really nice way to capture logic and data, and solve concurrency). It is meant to do one thing and do it well. It should have one type of data and logic, not contain all of the ins and outs of a complicated application.
ALSO. Because we are developing Open Internet, it would allow to make some canisters into general purpose libraries that others could use. That’s the promise of composability within the Internet Computer.
Depends on the architecture. If latency is important, then we’re currently forced to have a flat architecture where the access to many canisters has to be mediated by the client/agent.
But you’re right also subsequent “low latency” query calls add up, and you can’t use query calls in every case…
But that’s also the “problem” with standard cloud, calls between microservices add up. Usually, a pipeline (like Kafka) is used that the microservices consume and produce to. That is basically the same as actor model (which has an internal message queue). So basically, async actors (ICP) is the same architecture that most of the standard Web2 apps are built on.
One thing that helps a bit (I think InfinitySwap wrote about this or someone) is optimistic UI, that assumes everything is nicely working behind the scenes and dealing with any erroneous case when it happens. This can not obviously be used in all the places.
So it will be a balancing act. One solution doesn’t work for all. Some logic might need to happen on the “core” canister, others can be separated. The hope is that the capabilities of the IC will get better. A single canister can already hold 300Gb of data. Maybe it will be “infinite” on the protocol layer eventually. But like everything, evolution tends to happen in layers. On the IC, we don’t need rollup chains, but functional canisters that have logic to deal with different needs. There will be a need for helper libraries on top of the underlying tech. That includes different types of scalability solutions. Instead of reinventing the wheel everywhere, we should try to build reusable components.
Another thing is caching. That’s something that Web2 APIs couldn’t live without. On the IC, I am not sure how does that look with the consensus and everything.
Think of canisters on the Internet Computer as simple storage units with hard limits on capacity, throughput, and the slow performing ability to call other canisters. Single canister applications are limited, but through canister composition one can create powerful and resilient applications.
For this reason it’s important to think about how you can divide you application logic and data model into distinct microservices. For example, keeping all of the data for each of your users in a separate canister actor, and keeping that separate from an canister that holds image assets. This helps not just with scaling, but separation of concerns and both the composibility and flexibility of your application. If you have too much logic in a single canister or too much nested inter-canister call communication, you’ll struggle when migrating/adding features as you’ll need to slowly change the candid interfaces of communication without breaking something. As a social application, I would also look into the OpenChat v1 design GitHub - open-ic/open-chat-v1, but moreso what they have spoken about in terms of how they have scaled out for v2 (look into the Internet Computer Weekly podcast where they talk to OpenChat).
One of the drawbacks of this for many developers is this architecture design is meant for NoSQL storage, which means denormalization and duplication of your data, and a totally different type of data schema modeling. If you’re new to NoSQL, I highly recommend watching this video AWS re:Invent 2020: Data modeling with Amazon DynamoDB – Part 1 - YouTube to start to understand what a NoSQL data schema is, and how to think about modeling your data in NoSQL.
My grant project, CanDB is aiming to solve exactly this problem of scaling out your application and data model on the IC in a flexible and generic enough way. CanDB aims not just to replicate, but to expand upon the functionality and use cases of what BigMap was meant to be for IC developers.
@Ashley I’d be happy to talk with you in a bit more detail about your application design, and how CanDB might fit into your solution. You can DM me here (@icme or @canscale), or reach out to me on the developer discord @canscale if you’re interested.
This goes for all developers who are thinking about or running into scaling solutions. Now is a great time to get in contact and to better understand your use cases so that you can also help shape what CanDB becomes for the IC developer community.