Should I use HashMap or RBTree?

Hi everyone,
The Motoko language provides HashMap and RBTree. Is there a best practice on which one to use? I am just studying. I am not familiar with algorithms.

docs - HashMap

Runtime: O(1)
Space: O(1)

docs - RBTree

Runtime: O(log(n)) worst case cost per insertion, removal, and retrieval operation.
Space: O(n) for storing the entire tree. n denotes the number of key-value entries (i.e. nodes) stored in the tree.

2 Likes

Probably this one: GitHub - ZhenyaUsenko/motoko-hash-map: Stable hash maps for Motoko

In terms of performance, we have some benchmark numbers for different map structures: Collection libraries | canister-profiling

2 Likes

HashMap:

  • provides most efficient O(1) insert, lookup, & deletion
  • space taken is O(n)
  • Depending on implementation every so often the hashtable underneath the map needs to double, which can result in an O(n) operation.

RBTree/BTree:

  • provides O(log(n)) insert, lookup, & deletion.
  • space taken is O(n)
  • Sorts the data by key within the tree. This allows for operations like scanning between two bounds, which is powerful if one wants to paginate through entries in the data structure that depend on a sorted ordering.
2 Likes

Byron is being a bit modest here…he also wrote GitHub - canscale/StableHeapBTreeMap: A functional BTreeMap that can be used with a stable variable in heap memory which I think is a pretty solid improvement on the BTree that maintains sort order.

Byron…you need to pimp your ride. :slight_smile:

@chenyan…what is the mechanism that makes the rust maps so much more performant?

2 Likes

Rust’s hash map uses the standard Swiss Table. Zhenya’s hash map uses V8’s hash map, which is better suited for storing large objects. Data structure-wise, there shouldn’t be much difference for normal use cases. The performance diff really comes from being written in different languages. Rust is much more low level than Motoko, so it can generate code that is much more efficient.

Would you expect that Motoko would get more performant? Is there a particular mechanism that makes it not as efficient? Would adding a map at a lower level as a ‘native’ structure bring more performance in motoko?

It’s always getting more performant. For example, see the perf improvements from 0.8.4 to 0.8.5: bump to moc 0.8.5 by chenyan-dfinity · Pull Request #43 · dfinity/canister-profiling · GitHub.

Being a high level language means there is a performance overhead in exchange for a native and convenient experience. You cannot expect Python to be as efficient as C. But there are still many Python developers. FFI can be an option to bridge the perf gap, but it’s a long way to support FFI.

@skilesare
@chenyan
@icme

Thanks for your reply!

a.
Collection libraries | canister-profiling
The value of generate 50k in hashmap is 2_387_017_574. Is the unit of measure Cycle?

b.
Collection libraries | canister-profiling

We have a limit on the maximal cycles per round.

I understood that the less Cylce consumed by Map, the more processing can be done per round. Am I correct in understanding that faster Map processing does not mean faster response time?

c.
docs - HashMap

Runtime: O(1)
Space: O(1)

Maybe O(1) is wrong and O(n) is correct. I do not know.

d.
I had never heard of StableHeapBTreeMap before. Thanks.

e.
If I want to handle Huge data in a Map, I thought Rust should be adopted.
For general systems that are not Internet Computer, efficiency is related to processing speed and response time, and inefficient efficiency can be solved by replacing CPUs or adding servers, so there is no problem with a slow language. However, I thought that efficiency is important when processing on an Internet Computer and on a single subnet.

1 Like