Hi everyone!
It’s been a long time since v8.0.0-alpha.1 came out and v8.0.0 release version is finally here
Here is the description of changes v8.0.0 provides
Completely new algorithm
Since its first version, Map
has been using the deterministic hash map algorithm proposed by Tyler Close. It uses a data array to store entries in an order of insertion. Another array (hash table) is used to map a hash (key) to an index in a data array (value). Although having some modifications, the main paradigms of a deterministic hash map agree with the Map
behavior, meaning it will need to rehash (extend its data and hash arrays) when they fill up. Deleted items leave an empty space both in a data array and in a bucket. Despite having great performance characteristics, the above traits can be considered algorithm weaknesses in some scenarios.
V8 Map
completely overhauls this behavior and moves to an in-house built hash map algorithm that eliminates these weaknesses, the main advantage being that it completely eliminates the need of rehashing by using a tree-like structure to store its entries. As a supplement to the new hash map algorithm, as well as to preserve the feature of saving insertion order, V8 Map
also supports DEQ links to the next/prev inserted item. Together these new features open up a wide variety of possibilities. Lets go through all improvements it provides.
V7 Map
algorithm leaves an empty space in a bucket on entry deletion. This leads to a problem where an entry deletion slows operations with other entries in the same bucket. More so, in the worst case scenario, subsequent delete/insert operations on a single key could lead to a half-linear complexity of operations on entries in the same bucket.
V8 Map
uses a tree-like structure to store entries using 4 links that bond an entry with 4 branches on an upper level. Each such layer uses 2 hash bits to determine a branch, meaning all branches forking from an ancestor will have their lower n-hash-bits equal to the ancestor’s. This leads us to a simple way of deleting entries - we just replace them with any descendant leaf. This prevents any possible holes in a map, eliminates the need to rebalance the structure (opposed to conventional tree algorithms) while the tree structure itself eliminates the need to rehash (opposed to most hash map algorithms).
V7 Map
algorithm leaves an empty space in a data array on entry deletion. This leads us to 2 inconvenient consequences. Successive delete/insert operations will inevitably lead to rehashes even without size overflow (as deleted space still adds up). We can not easily implement DEQ methods (like pop
and peek
) as they could have up to a half-linear complexity in the worst case scenario.
V8 Map
algorithm uses DEQ links to preserve insertion order instead of an array. This opens up a lot of possibilities, such as full DEQ methods support, and the ability to insert entries mid-map, while eliminating the need to leave holes in a structure.
Conventional hash map algorithms can only make use of all hash bits when they are max size. Although upper hash bits are not needed for such algorithms in small maps, it is in fact possible to choose keys that will have conflicting hashes and appear in one bucket. V8 Map
eliminates this possibility as it will use up to all hash bits with any-size maps if an operation leads to a collision.
To determine key equality, V8 Map
now compares hash equality first to discard false-cases earlier, as hash comparison will be a lot faster for most key types.
New methods
As the new algorithm supports DEQ links, there are many possibilities for new methods:
-
A lot of methods received their Front/Desc counterparts
-
Most DEQ-originating methods were added to V8
Map
-
V8
Map
extends the default iterator logic and adds many new ways to traverse maps -
Ability to insert entries mid-map (mid-deq), move entries, update values without a prior
get
call -
Support for map mutation during iteration, maintaining iteration order predictable
-
Ability to easily convert
Map
to anArray
, new ways to createMap
from iterator
Set methods
V7 Map
version had 2 methods to add new entries: put
and set
where the only difference was that the latter ignores the result.
V8 Map
adds putFront
and setFront
counterparts to prepend new entries to a map (deq) instead of putting them at the end.
New replace
method was added to skip insertion and make changes only when the specified key already exists. Similarly, add
and addFront
methods allow you to insert a new entry and skip modification when the specified key already exists.
Move methods (putMove
, putMoveFront
, replaceMove
, replaceMoveFront
) were added as a convenience for cases where you want to move a value to the back/front if it already exists.
New methods (putBefore
, putAfter
) were added allowing you to put a new key/value pair before/after a specified key. These methods do not have their Move twins as they move existing entries after/before a specified key by default.
Update methods
A set of update methods was added (update
, updateFront
, updateMove
, updateMoveFront
). They are similar to the put
method but instead of a newValue they expect you to pass a function that receives an optional old value and returns an optional new value.
As in the case of put
, these support Front/Move variants which allow you to prepend new entries and move existing values to the back/front.
DEQ methods
DEQ links existence allows you to use full DEQ functionality alongside map methods.
New pop
and popFront
methods remove the last/first entry and return it.
To get the last/first entry without removing it, you can use peek
and peekFront
.
There are also cycle
and cycleFront
methods which take the last/first entry, move it to the front/back and return it.
Iterators
Iterators were completely overhauled. Map
extends the default iterator object that has a single next
method and uses the next structure for its iterators (prev
, next
, peekPrev
, peekNext
, current
, started
, finished
, reset
, movePrev
, moveNext
) where prev
iterates backward, current
returns the current value without moving an iterator, reset
moves an iterator to its initial position, started
and finished
return a boolean indicating an iterator state, peekPrev
and peekNext
return prev/next value without moving an iterator, movePrev
and moveNext
work similar to prev
and next
but return the iterator itself, instead of a value.
Of course, a for…in loop will not be able to use these methods automatically, but manual calls will be very helpful in some situations.
For the convenience of iterating backward with a for…in loop, there are new methods keysDesc
, valsDesc
, entriesDesc
. These will iterate forward if you use prev
, peekPrev
and movePrev
methods on them.
Sometimes you don’t want to iterate your map from the beginning and have a specific entry to start iterating from. Instead of skipping a number of iterations you can now use a new group of From iterators (keysFrom
, keysFromDesc
, valsFrom
, valsFromDesc
, entriesFrom
, entriesFromDesc
) which expect you to pass a key to start iterating from. By default this key will not be included in iteration (meaning you will start iterating from the next/prev item) but you can easily override this behavior by chaining movePrev
or moveNext
methods.
V8 Map
runs additional checks to make sure iteration order remains predictable when you remove/add an entry either before/after a currently iterating one. This behavior also extends to other iterating methods (like map
, filter
, find
, toArrayMap
and others).
On top of that, all Map
iterators are reusable, that is, after returning null once, the next next
or prev
call will start a new iteration cycle.
Conversions
Before, you had fromIter
method to create new maps from iterators which expected you to pass an iterator that returns a key/value tuple on every iteration. V8 Map
adds a fromIterMap
method, which allows you to map and filter iterator items. This is helpful if your iterator does not have the expected key/value structure by default. The corresponding fromIterDesc
and fromIterMapDesc
were also added to prepend iterator items to a map instead of appending them.
For the convenience of converting your maps to arrays there are 4 new methods toArray
, toArrayDesc
, toArrayMap
, toArrayMapDesc
. If you just want an array with a full set of your map’s key/value pairs, you can use toArray
. If you want to map/filter entries before putting them to an array, use toArrayMap
. To traverse your map backward while creating a resulting array, use toArrayDesc
and toArrayMapDesc
.
New hash functions
V7 Map
supported numeric hash functions for Nat
(nhash
) and Int
(ihash
) keys natively. With V8 Map
support for all combinations on numeric keys were added (nhash
, n8hash
, n16hash
, n32hash
, n64hash
, ihash
, i8hash
, i16hash
, i32hash
, i64hash
). Also, nhash
and ihash
now use 64 lower bits for hash calculation instead of 32.
Misc methods
Continuing the trend with Desc twins, forEach
, every
and some
are also receiving the ability to move through a map backward using corresponding forEachDesc
, everyDesc
and someDesc
methods. To keep the naming scheme aligned, the findLast
method was renamed to findDesc
.
The new empty
method is a shorthand for size == 0
check.
For the ease of cloning a map, a clone
method was added, which removes the need to use filter
together with a function that returns true, or similar methods.
Optimized files
Some code changes lead to worse readability while improving performance by a small margin. It is better to avoid such changes. That is what I am doing for the Map
. But the perfectionist part inside me still eagerly wants to leave these optimizations on. That’s why I’m creating a script to automatically parse the source files and generate fully optimized code (down to removing return keywords which scratches off 1 cycle). The parser is a very simple regexp search, which is only suitable for my cases and does not has full syntax understanding whatsoever.
Performance metrics
As a consequence of using a tree-like structure, which on average needs more iterations to find a key (has worse algorithm complexity than a common hash map algorithm), V8 Map
has a decrease in performance when comparing best case scenarios for both maps. Due to high code optimizations it was possible to decrease this performance deviation to 30% on average. V8 Map
will be faster in a general case scenario as it’s algorithm eliminates a lot of (if not all) performance flaws of the previous one and has low-to-none performance deviation in a best vs worst case.
Here is a performance comparison between some of the data structures using Nat/Nat key/value types. The number of entries is 100_000.
Delete Desc is a cycle cost of deleting all items starting from the last. It has deviations from a regular Delete for some algorithms.
Heap Space is an rts_heap_size call at the end of the first Delete stage. It includes garbage for Insert, Get, Update and Delete operations.
Space is an rts_heap_size call in the next message (when all garbage collection is done), which is the amount of space needed to store 100_000 entries.
Structure | Insert | Get | Update | Delete | Delete Desc | Heap Space | Space |
---|---|---|---|---|---|---|---|
V8 Map |
91_496_534 | 59_378_554 | 83_895_094 | 124_591_501 | 105_893_982 | 8_000_000 | 5_600_000 |
V7 Map |
91_238_554 | 30_270_177 | 41_784_888 | 80_128_837 | 77_557_928 | 20_665_288 | 3_448_660 |
Stable Hash Map |
2_707_678_519 | 1_144_832_218 | 1_200_854_954 | 1_171_944_898 | 1_175_349_479 | 398_126_032 | 3_724_452 |
Stable RB Tree |
2_374_238_947 | 264_585_170 | 1_582_756_699 | 613_056_039 | 612_856_037 | 487_921_556 | 7_600_028 |
Trie |
1_274_569_564 | 432_541_994 | 1_172_783_140 | 1_015_169_296 | 1_115_608_817 | 351_920_140 | 6_894_652 |
Percentage table, comparing all structures to V8 Map
Structure | Insert | Get | Update | Delete | Delete Desc | Heap Space | Space |
---|---|---|---|---|---|---|---|
V8 Map |
1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
V7 Map |
0.997 | 0.510 | 0.498 | 0.643 | 0.732 | 2.583 | 0.616 |
Stable Hash Map |
29.593 | 19.280 | 14.314 | 9.406 | 11.099 | 49.766 | 0.655 |
Stable RB Tree |
25.949 | 4.456 | 18.866 | 4.921 | 5.787 | 60.990 | 1.357 |
Trie |
13.930 | 7.284 | 13.979 | 8.148 | 10.535 | 43.990 | 1.231 |
Upgrade strategy
-
Rename all
findLast
methods tofindDesc
-
Add HashUtils as a second parameter to
map
,filter
andmapFilter
calls -
The new method now expects you to pass HashUtils as a first param
-
Migrate all your stable maps using
let newMap = Map_8.fromIter(Map_7.entries(oldMap), Map_8.x_hash)
Custom hash upgrade strategy
-
IMPORTANT! Your custom hash function must exclude 0xffffffff hash value as this one is reserved
-
Nat32 should be used as a hash type instead of Nat
-
New HashUtils has a third compound
getNullKey
(a function that should return any key with corresponding type, the value does not matter as it will never be compared to any other key or be passed to a hash function)