Absolutely.
Coolio. https://github.com/dfinity/stable-structures/pull/83
I struggle with the MAX_SIZE
variable. On one hand, my lack of passion for backend code and confidence makes me worry about calculating the wrong value (). On the other hand, as mentioned earlier, developers using my tool may upload unpredictable sizes due to the use of blobs. Additionally, Iām uncertain about the consequences of setting a value now and realizing, after two months, that it should be increased.
In summary, the MAX_SIZE
option is not necessarily a deal-breaker, but itās definitely a big challenge for me which I would be happy to spare.
So, is it acceptable to set a random large value for anything, like 64 GB, or would this cause any issues?
Dumb question, what do you mean with ācomposite key approachā? Like a key that contains multiple information?
Iām interested by the question of handling vector dataset in StableBTreeMap because I would like to convert e.g. an Hashmap of BTreeMap.
// Basically
pub type Collection = BTreeMap<MyEntityKey, MyEntity>;
pub type State = HashMap<CollectionKey, Collection>;
I use such pattern because developpers can define their own keys - the canisters I provide is generic. Therefore as you pointed out, serialize/deserialize the entire HashMap wonāt be performant.
So if I get it right, your advice would be to flatten the above to a single StableBTreeMap in which the key contains basically both CollectionKey
and MyEntityKey
, correct?
Side note: or is it actually possible to create StableBTreeMap on the fly (at runtime)?
I totally understand. Setting a reasonable MAX_SIZE
is quite a significant design decision, and itās not very obvious what value to set in many cases.
No. The BTree always allocates the maximum size of the value, so if you hypothetically put 64GB, youāll run out of memory very quickly The trade-off here is between memory usage and flexibility in the future.
Yes, exactly, and Iād still recommend this pattern even when we remove the MAX_SIZE
requirement from the BTreeMap in the near future.
Given the current MAX_SIZE
limitation though, thereās a trick where you can split your unbounded data into chunks. For example, you can have the following BTree:
StableBTreeMap<(CollectionKey, MyEntityKey, ChunkIndex), Blob<CHUNK_SIZE>>
In the above BTree, we split MyEntity
(the unbounded value) into chunks of size CHUNK_SIZE
, where CHUNK_SIZE
is some reasonable value of your choice.
For illustrative purposes, letās say you have a CHUNK_SIZE
of 2, and youād like to store the entities: (key_1, [1,2,3])
and (key_2, [4,5,6,7,8])
.
In this case, the stable BTreeMap above will look like the following:
StableBTreeMap => {
(collection, key_1, 0) => [1, 2],
(collection, key_1, 1) => [3],
(collection, key_2, 0) => [4, 5],
(collection, key_2, 1) => [6, 7],
(collection, key_2, 2) => [8],
}
In theory, yes, but in practice, not currently, because each StableBTreeMap requires its own virtual address space, so creating them does incur an overhead, so Iād recommend against this approach.
Got it! Thank you for the detailed explanation, everything is clear now. Although Iām confident that someone smarter than me could certainly solve my requirements using chunking as you suggested, as I donāt have any time constraints, Iāll wait for the MAX_SIZE
removal in the near future.
Iāve suggested this before, but I want to bring it up again. It would be nice if we could agree on a schedule for increasing the stable memory size, for example increasing it xGiB per quarter or per year based on some agreed-upon criteria. Right now it seems to just increase when DFINITY has a need for it to increase. Having this schedule would help us plan out future capabilities for our users of Azle, Kybra, and Sudograph.
We also received requests from the community to increase the stable memory. We need to do thorough testing and make sure there are no technical issues before committing to a fixed schedule. Since it is a non-trivial amount of work, I think we can allocate time for it in July/August. Would that work for you or is it more urgent?
Itās not urgent for us, we just want to push to get on a schedule for our future selves.
An update on that front: DFINITY plans to propose in an upcoming replica version to further increase stable memory from 64GiB to 96GiB. So far our biggest canister on the IC, the Bitcoin canister, has been pushing close to 64GiB, and the growth in Bitcoinās UTXO set has been accelerating. With further testing we can see that we can support a larger stable memory without issues and can continue increasing that storage limit.