Absolutely.
Coolio. https://github.com/dfinity/stable-structures/pull/83
Absolutely.
Coolio. https://github.com/dfinity/stable-structures/pull/83
I struggle with the MAX_SIZE
variable. On one hand, my lack of passion for backend code and confidence makes me worry about calculating the wrong value (). On the other hand, as mentioned earlier, developers using my tool may upload unpredictable sizes due to the use of blobs. Additionally, Iām uncertain about the consequences of setting a value now and realizing, after two months, that it should be increased.
In summary, the MAX_SIZE
option is not necessarily a deal-breaker, but itās definitely a big challenge for me which I would be happy to spare.
So, is it acceptable to set a random large value for anything, like 64 GB, or would this cause any issues?
Dumb question, what do you mean with ācomposite key approachā? Like a key that contains multiple information?
Iām interested by the question of handling vector dataset in StableBTreeMap because I would like to convert e.g. an Hashmap of BTreeMap.
// Basically
pub type Collection = BTreeMap<MyEntityKey, MyEntity>;
pub type State = HashMap<CollectionKey, Collection>;
I use such pattern because developpers can define their own keys - the canisters I provide is generic. Therefore as you pointed out, serialize/deserialize the entire HashMap wonāt be performant.
So if I get it right, your advice would be to flatten the above to a single StableBTreeMap in which the key contains basically both CollectionKey
and MyEntityKey
, correct?
Side note: or is it actually possible to create StableBTreeMap on the fly (at runtime)?
I totally understand. Setting a reasonable MAX_SIZE
is quite a significant design decision, and itās not very obvious what value to set in many cases.
No. The BTree always allocates the maximum size of the value, so if you hypothetically put 64GB, youāll run out of memory very quickly The trade-off here is between memory usage and flexibility in the future.
Yes, exactly, and Iād still recommend this pattern even when we remove the MAX_SIZE
requirement from the BTreeMap in the near future.
Given the current MAX_SIZE
limitation though, thereās a trick where you can split your unbounded data into chunks. For example, you can have the following BTree:
StableBTreeMap<(CollectionKey, MyEntityKey, ChunkIndex), Blob<CHUNK_SIZE>>
In the above BTree, we split MyEntity
(the unbounded value) into chunks of size CHUNK_SIZE
, where CHUNK_SIZE
is some reasonable value of your choice.
For illustrative purposes, letās say you have a CHUNK_SIZE
of 2, and youād like to store the entities: (key_1, [1,2,3])
and (key_2, [4,5,6,7,8])
.
In this case, the stable BTreeMap above will look like the following:
StableBTreeMap => {
(collection, key_1, 0) => [1, 2],
(collection, key_1, 1) => [3],
(collection, key_2, 0) => [4, 5],
(collection, key_2, 1) => [6, 7],
(collection, key_2, 2) => [8],
}
In theory, yes, but in practice, not currently, because each StableBTreeMap requires its own virtual address space, so creating them does incur an overhead, so Iād recommend against this approach.
Got it! Thank you for the detailed explanation, everything is clear now. Although Iām confident that someone smarter than me could certainly solve my requirements using chunking as you suggested, as I donāt have any time constraints, Iāll wait for the MAX_SIZE
removal in the near future.
Iāve suggested this before, but I want to bring it up again. It would be nice if we could agree on a schedule for increasing the stable memory size, for example increasing it xGiB per quarter or per year based on some agreed-upon criteria. Right now it seems to just increase when DFINITY has a need for it to increase. Having this schedule would help us plan out future capabilities for our users of Azle, Kybra, and Sudograph.
We also received requests from the community to increase the stable memory. We need to do thorough testing and make sure there are no technical issues before committing to a fixed schedule. Since it is a non-trivial amount of work, I think we can allocate time for it in July/August. Would that work for you or is it more urgent?
Itās not urgent for us, we just want to push to get on a schedule for our future selves.
An update on that front: DFINITY plans to propose in an upcoming replica version to further increase stable memory from 64GiB to 96GiB. So far our biggest canister on the IC, the Bitcoin canister, has been pushing close to 64GiB, and the growth in Bitcoinās UTXO set has been accelerating. With further testing we can see that we can support a larger stable memory without issues and can continue increasing that storage limit.
Another update: we did more testing with 400GB of stable memory and the tests were successful. DFINITY plans to propose in an upcoming replica version to increase from 96GiB to 400GiB. More increases are planned later this year.
Really awesome! Thanks for pushing this one Dfinity.
When can we also expect storage specific cannisters so I can get my data off of Google drive. At $5gb/year, itās just too costly.
I get 200 gb/year at $30 with gdrive
Would be applicable for Motoko Canisters as well? Usable ?
I can only comment from the engineering side, since thatās where I work: we are working on increasing storage capacity more. I am not aware of any plans to decrease the cost.
I think it should be accessible to Motoko using the low-level API that works directly with the stable memory. Here is the Motoko-specific discussion: Motoko stable memory in 2022
DFINITY plans to propose in an upcoming replica version to increase from 96GiB to 400GiB
The replica proposal with the change: Proposal: 127031 - ICP Dashboard
I wonder what the plan is to increase heap memory. With DeAI, which Iāve been eyeing, the LLM upload seems to hang due to lack of heap memory.
To increase heap memory beyond 4GB would require a change to 64-bit Wasm, which would be a daunting task, but a plan should be in place.
Thank you for bringing this up! Within the DeAI group weāve indeed identified the heap memory size as a key limitation. Are there any plans for upgrading to 64-bit Wasm and increasing the heap memory with it, that you (or someone else on the team) could share at this point, @ulan ?
I donāt think this is a fair comparison with Google Drive. First, Google Drive is not replicating your data 13 times. Second, Google undoubtedly profits from your data in other ways. An e2ee on-chain alternative to Google Drive will likely never be able to compete on price because it canāt mine your data and profit from using that data for driving advertisement revenue. If you want a cheaper way to store your data without giving it all to Google then you need to host your own solution.
I mean itās not just me who wants to replace legacy cloud. Dfinity and Dom profess everyone to ditch the traditional clouds of the world and start building on ICP. If we guys donāt compete on cost why would the bottom line-focused enterprises move to IC. Like the truly large-scale operations. They would need to keep their costs to a minimum.
Unless weāre building software which maybe expensive than normal but has different capabilities.
Just that we might not get the huge adoption as AWS/GCP as they compete on costs, which would matter to 95% of the businesses.
which maybe expensive than normal but has different capabilities.
Is this not what weāre doing already? An equivalent to Google Drive that leverages the IC fully would be decentralized under an SNS and private via vetkeys. Iām personally willing to pay more for privacy and decentralization, but I canāt speak for everyone.
Youāre right that there will be people and businesses that just want to spend as little money as possible, but why would they want to build on a blockchain? Money aside, itās quite a bit more complicated to build on the IC compared to traditional platforms so you need to have benefits that will outweigh that increase in complexity.