"No databases" vs BigMap vs orthogonal persistence

Continuing the discussion from Abstract away the 4GB canister memory limit:

One of the relatively unusual features of the IC is orthogonal persistence of canister state, and Dominic talks about this as a big advantage in that there are no databases to manage. But it looks like people building more-realistic apps do fall back to having a separate database layer, in part because of the 4GB limit on native storage. It looks a lot more like a traditional web app architecture that has stateless front-end servers that talk to some nosql stores…

Is this showing that maybe orthogonal persistence is not a good approach? Or do you imagine apps will have some combination of orthogonal persistence plus a database layer.

3 Likes

The need for the developer of the app to have to write, from scratch, solutions to manage persistent state using only in-memory data structures, with no direct access to storage, is disappointing. Orthagonal persistence doesn’t seem to make solving any of the standard issues any easier.

One seems to be faced with adding dump query methods to retrieve memory data in bulk and then bulk delete methods, etc. If I were developing a serious actor that needs to manage records forever or even for a limited period of time, like 30 days, those methods would be the first to write and of course, must be maintained as I make changes to the global in-memory data structures. Bulk update is of course also a must if one wants to deploy large sets of data quickly. The IC provides nothing here.

Source code is a bad place to define data structures that must persist their data forever, - databases and database schema were invented to fix this glaring problem. Source code is also a bad place to store schema definitions. However, that’s the only place the IC provides for developers to perform those crucial functions, putting an additional semantic and implementation burden on the developer that is greater than alternatives.

For canisters, and of course for any such system today, when schema changes, upgrade code must be written and tested. Mokoto, I guess solely because it is defined by Dfinity for the IC, has included in its language definition the hook points to perform one kind of upgrade, so you don’t have to do invent that hook yourself even more painfully. Having written such upgrade code many times, I can assure it is painful and buggy code to write, and is impossible to automate. How testing this upgrade logic will work in this environment then becomes a concern, bringing up questions such as: how long it takes an populate a canister with a GB of data from scratch, locally (if possible) or on the mainnet? Note, for one, that a complete backup after an upgrade is usually a matter of course. The IC doesn’t provide this at any time, as far as I can tell.

The problem is that there seems to be no abstractions available to developers to externally manage data separately from the canister code itself. Update calls are the only mechanism available. This means paying for the consensus protocol for every call, even if you simply don’t want or need it - i.e, initialization.

That sqlite just happens to be almost runnable in a canister shows how hard the problem is. As lovely as having an SQL API is, as a developer you are now faced with the prospect of treating all the scripts and such you might need or like to us as strings passed to update functions, with commensurate loss of semantics in the method calls itself, or updating the canister’s code itself constantly. At least one could update table structures and data without having to update the canister itself but the clumsiness of this approach makes using such a package in an actor problematic to me without any examples to review.

7 Likes

I think the decision to do away with databases was an attempt to avoid things like “eventual consistency” between clients. By having the data exist in the canister directly maybe consistency issues go away.

There is reference to a “Data Subnet” that canisters can be deployed to here. That subnet is not available in Mercury so maybe those would alleviate some issues.

However, I agree that it is very confusing to hear “No Databases” talked about as a feature. I have done some development using multiple databases and have recently landed on one that feels feature rich and solves most of my issues. Moving away from a database that has distributed ACID compliance and scales for me automatically and bills me based only on usage feels like step backwards.

To be honest, I would expect a response to be that we are thinking about this in the wrong way. As a developer, I am much more interested in data management as opposed to data storage. I don’t really care if my data is stored inside the canister or in a stand alone database somewhere else. What I care about are the tools surrounding managing and viewing that data. I think this is why they have BigMap and most likely soon have other similar services that provide functionality to more than just key value pairs.

Looks like someone is trying to port sqlite.

4 Likes

Is this showing that maybe orthogonal persistence is not a good approach? Or do you imagine apps will have some combination of orthogonal persistence plus a database layer.

I thought the answer is an orthogonally-persisted database built on top of IC, which is something @lastmjs is trying to build. The saying of “no databases needed” perhaps refers to the traditional notion of separately-provisioned and centrally-hosted database, which uses different paradigm than the app development paradigm.

1 Like