Your understanding is largely correct.
Regarding your points:
-
Because the stable variable data ordinarily resides in the heap, it can’t use more that 4GB anyway and should (in most cases, but, unfortunately, not all) deserialize back into less that 4GB. The cases where this can still go wrong is that the stable storage format does not preserve all sharing, just sharing of mutable data. So if you have large immutable graph in stable variables, with lots of internal node sharing, it can get serialized into a larger tree and then, on deserialization, might not actually fit in 4GB anymore.
-
Canisters can also access stable memory directly using the very low level ExperimentalStableMemory library. This gives you almost direct access to stable memory but requires manual memory management of a global, shared resource. This can be used in conjunction with stable variables.
We are in the process of extending and replacing this functionality to support carving stable memory into isolated, independently growable regions that different libraries can use without risk of accidental interference with other libraries.
- At the moment, when you declare variables stable it means their data is stored in the 4GB main heap until upgrade, when it is serialized to stable memory for deserialization after the upgrade. We’d like to move to a world where the serialization step is not necessary, by storing stable variables in stable memory throughout, not just during upgrades, but it’s early days. For large amounts of stable data, the serialization step limits scalability.
The current design does let one use a hybrid approach, using a moderate amount of stable variable data to store metadata about larger amounts of raw data stored in (Experimental) stable memory. So one could image implementing, say, a simple file system, using stable variables to store file descriptors and block allocation tables, and stable memory to store the actual blocks. I haven’t seen anyone use this approach in anger though.