It’s been a long time since v8.0.0-alpha.1 came out and v8.0.0 release version is finally here
Here is the description of changes v8.0.0 provides
Completely new algorithm
Since its first version,
Map has been using the deterministic hash map algorithm proposed by Tyler Close. It uses a data array to store entries in an order of insertion. Another array (hash table) is used to map a hash (key) to an index in a data array (value). Although having some modifications, the main paradigms of a deterministic hash map agree with the
Map behavior, meaning it will need to rehash (extend its data and hash arrays) when they fill up. Deleted items leave an empty space both in a data array and in a bucket. Despite having great performance characteristics, the above traits can be considered algorithm weaknesses in some scenarios.
Map completely overhauls this behavior and moves to an in-house built hash map algorithm that eliminates these weaknesses, the main advantage being that it completely eliminates the need of rehashing by using a tree-like structure to store its entries. As a supplement to the new hash map algorithm, as well as to preserve the feature of saving insertion order, V8
Map also supports DEQ links to the next/prev inserted item. Together these new features open up a wide variety of possibilities. Lets go through all improvements it provides.
Map algorithm leaves an empty space in a bucket on entry deletion. This leads to a problem where an entry deletion slows operations with other entries in the same bucket. More so, in the worst case scenario, subsequent delete/insert operations on a single key could lead to a half-linear complexity of operations on entries in the same bucket.
Map uses a tree-like structure to store entries using 4 links that bond an entry with 4 branches on an upper level. Each such layer uses 2 hash bits to determine a branch, meaning all branches forking from an ancestor will have their lower n-hash-bits equal to the ancestor’s. This leads us to a simple way of deleting entries - we just replace them with any descendant leaf. This prevents any possible holes in a map, eliminates the need to rebalance the structure (opposed to conventional tree algorithms) while the tree structure itself eliminates the need to rehash (opposed to most hash map algorithms).
Map algorithm leaves an empty space in a data array on entry deletion. This leads us to 2 inconvenient consequences. Successive delete/insert operations will inevitably lead to rehashes even without size overflow (as deleted space still adds up). We can not easily implement DEQ methods (like
peek) as they could have up to a half-linear complexity in the worst case scenario.
Map algorithm uses DEQ links to preserve insertion order instead of an array. This opens up a lot of possibilities, such as full DEQ methods support, and the ability to insert entries mid-map, while eliminating the need to leave holes in a structure.
Conventional hash map algorithms can only make use of all hash bits when they are max size. Although upper hash bits are not needed for such algorithms in small maps, it is in fact possible to choose keys that will have conflicting hashes and appear in one bucket. V8
Map eliminates this possibility as it will use up to all hash bits with any-size maps if an operation leads to a collision.
To determine key equality, V8
Map now compares hash equality first to discard false-cases earlier, as hash comparison will be a lot faster for most key types.
As the new algorithm supports DEQ links, there are many possibilities for new methods:
A lot of methods received their Front/Desc counterparts
Most DEQ-originating methods were added to V8
Mapextends the default iterator logic and adds many new ways to traverse maps
Ability to insert entries mid-map (mid-deq), move entries, update values without a prior
Support for map mutation during iteration, maintaining iteration order predictable
Ability to easily convert
Array, new ways to create
Map version had 2 methods to add new entries:
set where the only difference was that the latter ignores the result.
setFront counterparts to prepend new entries to a map (deq) instead of putting them at the end.
replace method was added to skip insertion and make changes only when the specified key already exists. Similarly,
addFront methods allow you to insert a new entry and skip modification when the specified key already exists.
Move methods (
replaceMoveFront) were added as a convenience for cases where you want to move a value to the back/front if it already exists.
New methods (
putAfter) were added allowing you to put a new key/value pair before/after a specified key. These methods do not have their Move twins as they move existing entries after/before a specified key by default.
A set of update methods was added (
updateMoveFront). They are similar to the
put method but instead of a newValue they expect you to pass a function that receives an optional old value and returns an optional new value.
As in the case of
put, these support Front/Move variants which allow you to prepend new entries and move existing values to the back/front.
DEQ links existence allows you to use full DEQ functionality alongside map methods.
popFront methods remove the last/first entry and return it.
To get the last/first entry without removing it, you can use
There are also
cycleFront methods which take the last/first entry, move it to the front/back and return it.
Iterators were completely overhauled.
Map extends the default iterator object that has a single
next method and uses the next structure for its iterators (
prev iterates backward,
current returns the current value without moving an iterator,
reset moves an iterator to its initial position,
finished return a boolean indicating an iterator state,
peekNext return prev/next value without moving an iterator,
moveNext work similar to
next but return the iterator itself, instead of a value.
Of course, a for…in loop will not be able to use these methods automatically, but manual calls will be very helpful in some situations.
For the convenience of iterating backward with a for…in loop, there are new methods
entriesDesc. These will iterate forward if you use
movePrev methods on them.
Sometimes you don’t want to iterate your map from the beginning and have a specific entry to start iterating from. Instead of skipping a number of iterations you can now use a new group of From iterators (
entriesFromDesc) which expect you to pass a key to start iterating from. By default this key will not be included in iteration (meaning you will start iterating from the next/prev item) but you can easily override this behavior by chaining
Map runs additional checks to make sure iteration order remains predictable when you remove/add an entry either before/after a currently iterating one. This behavior also extends to other iterating methods (like
toArrayMap and others).
On top of that, all
Map iterators are reusable, that is, after returning null once, the next
prev call will start a new iteration cycle.
Before, you had
fromIter method to create new maps from iterators which expected you to pass an iterator that returns a key/value tuple on every iteration. V8
Map adds a
fromIterMap method, which allows you to map and filter iterator items. This is helpful if your iterator does not have the expected key/value structure by default. The corresponding
fromIterMapDesc were also added to prepend iterator items to a map instead of appending them.
For the convenience of converting your maps to arrays there are 4 new methods
toArrayMapDesc. If you just want an array with a full set of your map’s key/value pairs, you can use
toArray. If you want to map/filter entries before putting them to an array, use
toArrayMap. To traverse your map backward while creating a resulting array, use
New hash functions
Map supported numeric hash functions for
ihash) keys natively. With V8
Map support for all combinations on numeric keys were added (
ihash now use 64 lower bits for hash calculation instead of 32.
Continuing the trend with Desc twins,
some are also receiving the ability to move through a map backward using corresponding
someDesc methods. To keep the naming scheme aligned, the
findLast method was renamed to
empty method is a shorthand for
size == 0 check.
For the ease of cloning a map, a
clone method was added, which removes the need to use
filter together with a function that returns true, or similar methods.
Some code changes lead to worse readability while improving performance by a small margin. It is better to avoid such changes. That is what I am doing for the
Map. But the perfectionist part inside me still eagerly wants to leave these optimizations on. That’s why I’m creating a script to automatically parse the source files and generate fully optimized code (down to removing return keywords which scratches off 1 cycle). The parser is a very simple regexp search, which is only suitable for my cases and does not has full syntax understanding whatsoever.
As a consequence of using a tree-like structure, which on average needs more iterations to find a key (has worse algorithm complexity than a common hash map algorithm), V8
Map has a decrease in performance when comparing best case scenarios for both maps. Due to high code optimizations it was possible to decrease this performance deviation to 30% on average. V8
Map will be faster in a general case scenario as it’s algorithm eliminates a lot of (if not all) performance flaws of the previous one and has low-to-none performance deviation in a best vs worst case.
Here is a performance comparison between some of the data structures using Nat/Nat key/value types. The number of entries is 100_000.
Delete Desc is a cycle cost of deleting all items starting from the last. It has deviations from a regular Delete for some algorithms.
Heap Space is an rts_heap_size call at the end of the first Delete stage. It includes garbage for Insert, Get, Update and Delete operations.
Space is an rts_heap_size call in the next message (when all garbage collection is done), which is the amount of space needed to store 100_000 entries.
|Structure||Insert||Get||Update||Delete||Delete Desc||Heap Space||Space|
Percentage table, comparing all structures to V8
|Structure||Insert||Get||Update||Delete||Delete Desc||Heap Space||Space|
Add HashUtils as a second parameter to
The new method now expects you to pass HashUtils as a first param
Migrate all your stable maps using
let newMap = Map_8.fromIter(Map_7.entries(oldMap), Map_8.x_hash)
Custom hash upgrade strategy
IMPORTANT! Your custom hash function must exclude 0xffffffff hash value as this one is reserved
Nat32 should be used as a hash type instead of Nat
New HashUtils has a third compound
getNullKey(a function that should return any key with corresponding type, the value does not matter as it will never be compared to any other key or be passed to a hash function)