Do we still need BigMap / NoSQL database?

I have not read every post in this thread, but I can speak from a DSCVR perspective.

I believe better control over our stable storage would help us quite a bit. Right now, writing the entire DSCVR state to stable storage will fail due to hitting the max cycles per a request. This is happening in pre_upgrade. When we could pass pre_upgrade and then load the state from post_upgrade, we could not copy over all the objects out of the state and rebuild our indexes. We then would have update functions that would load the state and then chunk a copy of the data we needed from the stable storage state to the desired objects progressively (i.e. restoring indexes and content).

Currently our state is too big to write to stable storage in one pre_upgrade, so we can’t even get to post_upgrade. I know we can solve this problem if we had libraries in place that worked like traditional file systems and allocate regions of the stable storage that were wrote to progressively… Like storing stale data in a specific region of stable storage in chunks and have an object that kept track of what regions this data was stored in. Just to note, 500mb of data being written and stored into stable storage is about the limit we are seeing. I also understand that we could potentially improve our data models, but I believe we will eventually hit a limit with the maximum amount of optimization.

Regardless if we need BigMap, NoSQL, SQL - Having something like ICFS would be a predecessor to those types of tools. Data still needs to be marshalled and building simple abstractions on top of the current APIs will just make this process easier. We get 8GB of storage now and eventually we will get 300GB? I could image we use a concept like drive volumes that are dedicated regions of the stable storage (which can be resized) that hold data for us and are progressively updated and repurpose pre_upgrade/post_upgrade for our highly transactional data which makes up about 10% of DSCVR’s state.

stable_save(object, region, index) 

Object that is being serialized to be stored
Region that the object is being stored in
Index of the region we are writing it too. (not sure about this one)

I could image having a master file table that tells me that objects ID 1 → 1,000 are stored in Region A at Index 5. These blocks of similar data are 50mbs~ in size and can be quickly retrieved, mutated, and stored again.

Also, this would empower map reduce, we could through a series of update calls process the stored data and modify the canisters state relative to rules created in future updates.

7 Likes

Deterministic Time Slicing (DTS) should be able to help here without additional application-level logic.

3 Likes

DTS is indeed my number one most needed feature right now. It will allow for a lot of interesting tooling and solutions to spawn on the IC ecosystem.
Let’s hope it will comes soon :crossed_fingers: :crossed_fingers:

I think progressively using stable storage is the way to go, even when DTS comes out. DTS will be great for map reduce functions and a bunch of other things.

We talk about Big in this thread, but can someone define Big? 10GB, 50GB, 100GB, 1TB, 1ZB? File storage, NoSQL with a single primary key, and SQL all have massively different implications when dealing with scale.

With file storage (images, videos, blob) horizontal scale becomes easy to conceptualize with multi-canister approach. With NoSQL and a single primary key, it becomes harder, but complex relation databases is extremely complex.

I think we should master the single canister before trying multi-canister with complex data systems.

3 Likes