Hi @domwoe, the Cache canister is basically for performance. The idea is to maintain a static representation of what we call Discourse State, so that our Data canister doesn’t have to process the deluge of queries we’ll be getting as we scale up.
thanks and sorry for the late reply.
The pre-computation of static views that can be fetched with queries is a good idea. However, I wonder if it makes sense to have designated cache canisters for this. Queries are in general executed in a separate thread on the replica, so they shouldn’t limit the progress of your data canister.
Is your concern about the rate limits of Boundary Nodes?
Hi @domwoe, after going through this thread for a bit, yes, I’m wondering what @dsarlis and @free think about cache canisters too. In fact, I wouldn’t want to implement a solution without their input, thanks for putting me onto the current state of things! We will definitely be reaching out for help here on the Forum as we move into the cache canister feature in coming weeks.
I am not sure I can say something very specific without understanding more about your thinking here.
Caching some expensive computation so you can serve queries faster is definitely a good idea but whether you need to push that out to separate canisters or you can do it in your main data canister is another story. Dominic asked a good question. Are you worried about getting rate limited by boundary nodes if only the data canister can serve queries? Or in other words, how much load would you expect to handle in terms of queries?
A Starting Point is a very meaningful reference, which is a very visual expression for voters, I like it.
I hope Civol can provide a similar experience. Looking forward to see your progress❤️
Thanks, @MillionMiles, yes, Civol does have a similar experience in the form of our Debate feature, which allows panelists to challenge other panelists to debate on a specific question. Key differences are that Civol Debate is spontaneous and allows the community to vote on everything the debaters say, point by point. We’ll be rolling it out for the beta, so stay tuned!
Thank you, @dsarlis, I so appreciate your time and perspective! Yes, worried about rate limitation in the context of maximizing performance for the user.
The thing about Civol is, it does appear capable of scaling massively, as in millions of active users for the largest instances. For example, there might be an instance for a nation of say 100m, and they could be discoursing on multiple subjects (10-20?) of importance to them all, simultaneously… So possibly 10m+ users engaging the dapp during peak moments. Let’s assume 10% of those users could be querying in those moments. That would mean 1m queries hitting the Data canister simultaneously. To reliably service a flow like that I imagine multiple Cache canisters being needed, scaling up and down in number with demand.
It’s obviously going to be many months before Civol actually faces this problem (assuming people really like the experience), but possibly not that many, and it illustrates the need for a maximally scalable solution. So to me this means we need to spec out that solution and build a solid foundation v1 for it here at the outset, to the extent possible. Does that sound reasonable, and what do you think would be the best way to go about it?
Yeah, it sounds reasonable and I think it’s very good you’re already thinking through this. I agree that if you expect somewhere in the order of millions of queries concurrently, you can’t hope to serve this kind of traffic from a single canister alone (even if you do caching/precomputing of results).
Having an extra layer of Cache canisters is the right direction to be looking at. I would also suggest to go about this in steps as you alluded to. E.g. you can think about the implications of such a design and possible solutions and you don’t need to go all out initially but start with the right foundation. I would imagine that you can even start with a single Cache canister just to get things in place to ensure the architecture works and then scale out to more when you need. This can also give you early feedback on what works well or not.
Some things to consider when you go to a multi-canister architecture (not necessarily for your case only):
- How would Cache canisters retrieve the data from the data canister? Could that also be a bottleneck (if all cache canisters hit the data canister)?
- How will you manage the canisters lifecycle? What about the upgrade story? Can they be upgraded independently or will there be dependencies? (Likely you want to avoid dependencies as much as possible or in other words be able to upgrade the data canister without having to necessarily touch the cache canisters)
And probably more questions if someone starts an actual design doc and think through alternatives
hi @dsarlis, thanks for confirming we’re on the right track!
Yes, how does Cache canister pull from Data? After executing all the queries and filling the Cache, we would then only re-run the queries where the underlying data had changed. And I was thinking possibly only one Cache canister would actually update periodically from Data, and then we might be able to clone it as needed? And yes, lifecycle and update (no dependencies), good points…
I would love to have you take a look at our solution and advise if you’re available. Please dm me if that might be possible sometime in the next few weeks. Thanks again!
Sounds like it’s a good idea to talk At least to give you some pointers on what’s even possible or not. E.g. cloning the data is not something you just get for free somehow. Let me know, I’m happy to help any way I can (either reviews or simply explaining what’s possible and figuring out feasibility of solutions).
Thanks so much, @dsarlis, I will be in touch. We’ve already created a Backup canister for Data so we’ll probably start there. Will undoubtedly have many questions!