[union-db] Let's build an infinite database together!

Hello everyone!

Intro

I’m building an infinite database for the Internet Computer, and I need your expertise to make it a reality.

The idea is to enable developers to focus solely on their business flow and let the database handle all the scaling-related things automatically, while keeping interaction with it simple. The goal is to create an efficient and robust solution for storing and accessing infinitely large amounts of data.

I want this project to be different: open and transparent. I’ve prepared a detailed documentation that explains how the database works, from its architecture to how data is stored and retrieved. Everyone is welcome to read and comment it.

I value your insights and want you to be part of the process. Read the documents, ask questions, and share your ideas. The documentation is open for commenting. Together, we can shape this database into something amazing.

What’s next

This documentation, among other things, defines the scope for the PoC release. Once this scope is understood and agreed upon all participants, I’m going to implement it and prepare a public demonstration of what union-db can do.

@domwoe
@cryptoschindler
@berestovskyy
@skilesare

20 Likes

Nice work. I’ve scanned thru it and got a few comments & suggestions (Correct me if I’ve got things wrong)
The rebalancing allows you to have canisters with ordered items and bounds


Which will help the client scan it and get ordered by key items with a query >14 and <24
This will send a query directly to the second canister and provide ordered results, which is more powerful than what we use now in Anvil. For a ledger, we’ve got 32 canisters that are not rebalancing and it’s a hashmap of canisters with hashmaps inside. Union-db is like a BTree of canisters with BTrees inside.
In a ledger, you will probably never need to get ordered accounts, so it’s fine for our use case.
But in your case, in my opinion, your Demo project doesn’t need the DB you are making and you should probably change it with something that will showcase its potential better. Perhaps a DB with a lot of documents & indexes.

As far as I understand the demo batch functions use transactions and place things in different canisters by passing the transaction ahead and that locks documents until the transaction is complete, then they get unlocked. Before the transaction is complete, I suppose queries return the old values.


But then between these two notifications, a query fetching from both canisters can slip, and get the new data from shard 4 and the old one from shard 3.
image

I am curious how the DB handles transactions like these. How many cross-canister calls they will take? How much time it they take?
Find all friends of friends and send them a message.
Find all users with score > 100 and < 200 and increase their level by 1
Get top 200 users and their last 5 posts and top 10 comments in each post
Sounds like these transactions should work like MapReduce
Then it will probably be better to not fill the canisters until they reach the memory limit, more like fill them up with enough documents they can process inside one call.

2 Likes

Hey!
There are a lot of questions. Let me know if I forgot to answer any.

First of all, thanks for the reply. As I can judge from your comment, you have a pretty complex use-case in mind and this is good - we do need to look at this project from the perspective of real-world scenarios and not simple use-cases.

That’s right, and it is like a multi-dimensional btree of btrees - each collection is stored in it’s own dimension. This makes it possible to take indexing capabilities beyond a single canister and make the system dynamic - so it could scale from 1 canister to an infinite number of canisters. This is very different from static sharding techniques, e.g. hash ring.

That’s a great input. I didn’t put a lot of thought into the definition of the Demo project. I just wanted to showcase a really simple and popular dapp and how it would work in high-scale scenario. Because, if the performance is bad even for such a simple flow, then it may be not worth it to waste time trying to compose a bigger demonstration example.

The plan was to limit stable memory of each canister to something like 1MB (it is a feature of ic-stable-memory) and to populate it with a lot of accounts, so the database would scale to something like 100 canisters. And then to measure the performance for heavy transactions (batch transfers) which, because of the scale, will trigger the majority of canisters to perform some work.

It would be really helpful if you could suggest an alternative Demo use-case.

That’s correct, if you’re referring to query canister methods, and not queries like in SQL.

The documentation says explicitly, that the state in union-db is “eventually consistent”. This means, that sometimes queries may return stale values, even if the transaction that updated the value just reported a success.
Other transactions are immune to these inconsistencies, because of document locking. Refer to Transactions section of the documentation for more info.

I expect the performance to be good, comparing to implementing the same kind of functionality by hand. Mainly because of background transactions. But the truth is, I’m doing this PoC because I don’t know for sure, if it will work fast enough. In theory this should work.
And in theory there shouldn’t be a lot of difference in perfromance between a database with 3 shards and a database with 300 shards, if you write your transactions the way I described in the docs.

The documentation does not contain a lot info about locating and mutating multiple documents at once, which is an essential functionality for every database. Despite that, it is possible to efficinetly implement the flows you’ve mentioned with union-db.

Mainly the reason for not including the documentation about that, is that it was a lot already. I already did present this database to people in Scalability and Performance WG, and their main consern was the internal implementation. So, this documentation is targeted more towards this kind of discussion.

But I understand that there should be info like “how to translate this SQL query to a union-db transaction”. I hope, we’ll get there eventually.


In a meantime, I would suggest diving deeper into the documentation. There is a lot of info to digest. Some of it (like, why transactions can’t fail once they execute their last step) is not trivial and requires a little bit of time to process.

Take your time to make it through the docs. Ask me anything and propose changes, if you have ideas yourself.

Hope this helps.

2 Likes

By the way. One tiny little rule that I want people to follow to make things easier for everyone.

If you have a general comment about union-db or other related things - it is fine to leave it here, on the forum.

But if you want to ask a question about a particular paragraph of the documentation - please use Notion’s functionality to do that. This will make the discussion a lot cleaner and will enable us to refer to specific parts of the docs if needed.

Thanks!

I think we need something like the “ToDo” app https://todomvc.com/ every frontend framework at some point was showcasing with.
I’ve made something in that direction, but it needs more use cases. https://github.com/infu/internetbase-sql-demo
It would be great if we can show how the same project gets implemented fully or partially with different DB solutions the IC can offer and what are the pros/cons, limits, speed, cost, etc.

1 Like

Started a dedicated discussion thread in Notion - Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

1 Like

I am deeply impressed with your development. It is incredibly! Recently, I have become more and more interested in the use of databases together with Motoko on IСР. I have seen many different developments based on BTree, but yours is brilliant. I’ve looked at your documentation, and the steps for further growth are also visible. I am proud of your genius, keep growing. I will look forward to your release to try and put your database techniques into practice!

2 Likes

Thanks a lot Sasa for bringing in much more details. A few high-level questions.

In the current design the transaction state to rebase across shards includes all the arguments and some intermediate state, right? Considering the examples mentioned in the thread above (top N users/posts), the intermediate state might get quite large, and hence rebasing might affect the overall performance and costs.

I’m also not sure the lock + transaction queue is enough for distributed systems. Seems like this might deadlock quite easily on parallel updates…

Also desugaring looks challenging with no compiler support. Maybe we have a similar functionality macro example to be sure it’s possible?

2 Likes

Thank you for taking your time and diving deep into the documentation.

I understand your concerns. I’m going to reiterate and try to find solutions for all of them.

2 Likes

This looks interesting. Keep at it. Let me know if you need someone to be your first user. I can run tests on it as well.

3 Likes

@berestovskyy, @domwoe, @senior.joinu, and I had a meeting and discussed some questions there. Posting my initial written feedback here for posterity.

TL;DR: Would it make sense to reduce the scope of the project to “Scalable Distributed Transaction” and evaluate how feasible it is across multiple subnets?

Hi @senior.joinu,

First of all, thank you very much for the initiative and the hard work on this problem. I went through the documentation and slides and see you have put a lot of effort and thought into it.

This is a very ambitious project and the scope is huge. The current approach appears to be a breadth-first style approach that covers many different aspects:

  1. How to implement distributed transactions.
  2. How to implement automatic scaling and rebalancing.
  3. How to represent transactions in Rust as serializable coroutines.
  4. How to handle potential failures.
  5. etc.

Each of these items is a non-trivial project on its own. Tackling all of them at the same time, might be too overwhelming. I wonder if a more reliable strategy would be to focus on one aspect at a time.

In my opinion, implementing scalable distributed transactions is the most difficult and critical item here. Would it make sense to focus on this aspect first and make sure it is feasible?

I tried to understand the proposed algorithm for distributed transactions based on the following documents. Please let me know if I missed another relevant document:

The goal is to support standard ACID transactions, right? Or do you have some other kind of transaction in mind?

Since IC guarantees durability and consistency is application specific, the most important properties for us are atomicity and isolation. The classical solution for atomicity is two-phase commit (2PC) and for isolation is two-phase locking (2PL).

The documentation doesn’t mention 2PL+2PC explicitly, but mentions locking during transaction and committing after transaction completes. Did I understand correctly that the proposed algorithm is a variation of 2PL+2PC:

  • while the transaction is in progress, it acquires locks on accessed entries both for read and write operations.
  • the dispatcher service remembers all entries that are locked by the transaction.
  • if the transaction succeeds, the dispatcher service goes through all locked entries and commits new values (for writes) and releases the locks.
  • if the transaction fails, the dispatcher service goes through all locked entries and releases the locks without committing new values.

If this understanding is correct and the proposed algorithm is indeed 2PL+2PC, then the crucial question is how well it is going to scale across multiple subnets? The known weak point of 2PL+2PC is exactly the poor scalability due to the locking and reduced throughput due to contention.

The latency of messages from one subnet to another could be in the order of seconds. IIUC, a transaction may travel subnets multiple times, so a transaction may take multiple seconds to complete. I wonder if such a latency would be acceptable to the clients of the database. It might be worthwhile to get the simplest possible prototype of the algorithm and test it on the mainnet to estimate the latencies.

The latency of messages within a single subnet is quite low (order of milliseconds), but I would argue that sharding within a single subnet is not very useful because the stable memory of a canister will grow in the future.

Scaling distributed transactions is a difficult research problem. If you would like to read up more about it, I would recommend starting with this paper: Transactions for Distributed Actors in the Cloud. It is about speeding up 2PL+2PC by optimistically releasing locks earlier and aborting dependent transactions if the optimistic assumption doesn’t hold. A cool thing about it is that it’s been implemented in an actual actor-based production system. The related work section contains references to other interesting papers.

Thanks again for the hard work here!

7 Likes

Hey everyone!
Sorry for a long delay.

The new draft of the technical design is ready and available here:

(this is a new link, the old one leads to the previous draft of the documentation for history-preserving purposes)

This new design is much more simpler than the previous one. I got rid from all the additional stuff and focused solely on the core functionality, making sure it can be extended later. Now we only have 4 documents. Everything else, that was covered earlier, but didn’t make it to the current draft is considered as a subject for future developments.

The design itself was hardly reworked, as an attempt to enable all the features we were discussing here, in this thread.

First of all, the state itself is now implemented with a different data structure, which greatly simplifies the design. It is still a b-tree of b-trees, but arranged in a slightly different way, with only numbers (currently, Nats) as possible keys. This makes the whole database globally ordered, which, with the new CompositeKey feature, enables a lot of interesting stuff.

Second, we now have a separate document describing Queries - which are functions that allow clients to fetch a lot of data very fast and cheap, while traveling between shards. This document was added as a response to @infu’s and @berestovskyy’s comments about not being able to return a lot of data to clients with the previous design. Now it is possible and is very efficient.

@infu, I’ve put a lot of thought into your idea of making transactions in union-db similar to map-reduce. What I realised is that with the current design, it is not possible, since each shard only knows about its closest neighbors from left and right, and not about all the shards. But, the good news is that the new transactions design allows us to implement this parallelism pattern later, when we will implement Caching.

By the way, about the transactions - they are now completely different. They no longer rely on custom syntax or coroutines (@berestovskyy). Instead, the whole distributed transaction engine is now just a minimalistic implementation of Saga pattern (more on that in the docs). This makes them very much decoupled from the rest of the library (@ulan) and makes it possible to use them in any other distributed context, even without the rest of union-db.

Sagas are inherently ACD, and not ACID. But I put a lot of effort to show through the examples, that this is not a big issue and there are a lot of ways of adding the isolation layer on top, if you really need it. Moreover, there is an example, that implements fully functional 2PC protocol, using only the proposed Saga implementation.

Another exiting thing that I want to tell you is that this great redesign started initially with @berestovskyy’s comment about deadlocks. Now, with this new database structure, composite keys, iterators and a little bit of savvy, you can implement 2PC protocol completely free of any possible deadlocks, by breaking the circular wait condition and making all your locks appear in the same order.

There is a whole example about this in the documentation, make sure you read it, if that seems interesting to you.

I did my best to try to bring my original idea as close to what you guys see. As @ulan proposed, the scope of the PoC is also changed. We’ll first try to implement this new transaction engine, that should work with any other use-case. This will allow us to decide, whether it is even reasonable to implement cross-subnet transactions or not. The demo project is also different now (@infu) - now as a demonstration of the transaciton engine, I will implement an online shopping app, with several canisters-microservice and an external invoice-canister-based payment system.

Thank you all so much for your help! Also, big thanks to @domwoe, for helping me to drive this project to the good - without your help it would not happen.


Please, ask me anything. The docs are open for commenting, as usual. One note is that, Transactions and Queries documents are very heavy on code, so my apologies to someone who is not technical enough. It thought this was a best way to realistically show various flows that can now be implemented with union-db.

9 Likes

Thanks @senior.joinu! I’ll try to understand and review the transactions section. I’ll probably take me while.

1 Like

Hi @senior.joinu,

I looked into the design of Transactions. Thanks for writing up!

IIUC, developers would need implement the compensation actions of the Saga pattern manually themselves and by default there would be no isolation. I have concerns that this might be too difficult to use and error-prone for developers:

  • IMO it is difficult write correct “undo” code because it is easy to miss subtle changes/corner-cases that are not fully undone. In my experience, almost all mechanisms that rely on manual undoing had one or more subtle bugs.
  • I discussed this topic ACD vs ACID with experts in distributed algorithm at DFINITY and they agreed with isolation is important for usability.

In other words, I am not sure that Saga is a developer-friendly approach. As we discussed earlier, it would be great to start with a dapp that needs scalability first and try the ideas there without generalizing into a library. That would give a good signal of what works and what doesn’t. By dapp here I mean an actual product that will be used, not a demo project. Maybe @domwoe and you could think in that direction? Otherwise, I am not sure how successful we will be if we build a general library without actual clients.

1 Like

Hey @ulan,

Thanks for taking your time and looking into the new draft of the design!

That’s correct.

I agree, that it is more difficult to write the compensating logic yourself. But I can’t agree with the following words:

Sagas is a well-established design pattern within the microservices community. For the last several years Sagas were the mainstream approach for implementing distributed transactions in regular web 2.0 microservice-based systems. I would suggest everyone to watch this great talk from 2015 on what is this pattern.

My point is that if people are using it, then they either don’t find it difficult or they’ve invented good error-proof abstractions within this pattern. I mean, good libraries and other primitives (which we can also do).

I could agree that the exact implementation of Sagas that I propose may not be developer-friendly - this is absolutely possible. But if that is what you mean, then I could just improve it in a way people need it to be, over time. No need to switch the whole idea completely again.

But it is that way for a reason. It is very minimalistic, as I already noted, but this makes it extremely versatile in good hands. If one knows, what they’re doing (obviously, at the beginning this one will be me), they can implement anything they want on top of this implementation. As a separate library. Then they can distribute this library to everyone else. We can have 2PC+2PL, parallel execution or caching. We can even have some of our data to be stored in canisters and some of it - in regular web-servers (we just need an IC-mimicking runtime and some adapters to execute the database in it).

Unfortunately, we can’t have both: fast execution and isolation - we have to choose one. Previous draft did isolation by-default, with no ability to opt-out for better performance. Current draft does performance by-default, with an ability to opt-in for isolation. I see this as an improvement, since the current design allows for more options.

This is what I also did. This experiment is not over yet, but from what I observe at the moment, it is not successful. Most people don’t respond at all. Those, who respond, say that there is no reason for them to switch the wheels in the middle of the ride, since their in-house solution works good enough for now, which I totally understand.

I don’t know the reason for no responses. Maybe it is just summertime - everyone’s on vacation. Maybe people don’t want to take the risk first - they are already building a project in a very risky setting, so they need someone else to take it and succeed, before they would. Maybe something else.

I understand, that such a use-case is important for every project. But I don’t see any other option except building the use-case myself. Do you see this as an option? Let me know what you think.

Thanks again for the reply and have a great day!

5 Likes

So is this good to test now. I would like to take it for a spin. I’ll prob add some logging and dig more into how it scales.

1 Like

No, unfortunately it is not ready yet. I can’t give you any estimates.

Would you mind telling us, what do you want to use it for, please?

Since the compensating actions are domain specific, I don’t see how general-purpose libraries could help to avoid business logic bugs. What I am trying to say is that writing the “undo” action is much more difficult that writing the “do” action [this I know for sure from experience].

People using the Saga pattern with microservices, doesn’t necessarily mean that it is not difficult to use. It could also mean that there are no better alternatives.

Instead of diving into details of Saga, let’s take a step back and focus on main question is: “How to ensure that this project is going to have a return on investment?”

The worst case scenario would be that you spend a lot of time building this and then no one uses it.

Thanks for trying. I think it is critical to have an actual use-case. That would also help to quickly resolve design questions like whether to use Saga or 2PC/2PL.

I am not sure if it is a good idea to start a product for the sake of using a specific technology. If it is difficult to find actual clients, then maybe that’s a signal that the problem might not be as impactful at the moment.

I would recommend chatting with @domwoe about the next steps here. Perhaps there are more impactful problems that would provide immediate benefits to users or developers.

3 Likes

Hey there everyone!

Quick news. After more talks with the Dfinity team, we couldn’t agree on funding for union-db. This doesn’t mean, that the project is abandoned now - I’m going to continue building it (maybe, with a slightly different set of priorities), but unfortunately I could only be able to do this in my spare time.

I’m going to keep you updated the same way as before. And the documentation will remain open for commenting.

Have a great day.

13 Likes

Awesome project: Any new updates?