@skilesare
TLDR Tweet: Red-Black trees are self balancing, whereas a Trie is not balanced and does not re-balance.
Going deeper if you care to…
This means that for Tries, depending on the order in which records are being inserted, one could end up with a lopsided Trie from the root node, resulting in an asymptotic complexity for get/put/delete of O(n)
. Tries are typically used for word dictionaries, as they conveniently store paths from the beginning of a words to the end, and can label points along the trie graph that symbolize the end of a word.
Red-Black Trees are self balancing binary trees, which follow several invariants upon each insertion in order to maintain the balanced state of the tree. This means that both subtrees from each root tree are balanced, resulting in an asymptotic complexity for get/put/delete of O(log2(n))
.
To get a better mental picture of how Tries work, I recommend inserting items in the following order “a”, “ab”, “abc”, and “abcd” into this Trie visualizer tool Trie Visualization
To get a better mental picture of how Red-Black Trees work, I recommend inserting the same “a”, “ab”, “abc”, and “abcd” into this Red-Black tree visualizer tool Red/Black Tree Visualization and comparing the two results.
While traditionally Tries are used for “word dictionaries” and the size of each leaf is the size of the alphabet being used (26 letters for english), the current implementation of Trie in motoko-base is slightly modified and is not geared around this same use “word dictionary” use case. It sets an arbitrary limit of 8 elements per leaf, meaning that at each leaf of the Trie there room for 8 elements to be inserted. Elements are ordered and compared at each leaf of the Trie by their Hashed result, and collisions are handled in a similar fashion to a HashMap. Once this limit of 8 element in a leaf is surpassed, the Trie creates a branch leading to another deeper level of the Trie with another leaf (and 8 more element slots).
If the data that you are inserting into your TrieMap is inserted in a perfectly random ordering, you’ll most likely not see too big of a performance difference - but if it is inserted in a monotomically increasing order (like with a timestamp), you’ll end up with an incredibly lopsided Trie, which makes for bad query/update performance. Even a normal random insertion order is bound to have some elements in the Trie that are much deeper down than other elements, leading to inconsistent performance.
In most cases where key value lookups and operations are required, I would therefore say that in terms of Motoko libraries currently Red-Black Trees provide the best combination of performance and space efficiency out of any of the existing data structures.
I have plans to build out a B-Tree data structure in Motoko, which is another self balancing tree data structure that has slight performance benefits over Red-Black Trees as the size of the Tree increases past several GB, as well as a few slight additional benefits over the current functionally programmed implementation (in specific deletion).