After a longer beta testing phase and some fine-tuning, we now “bless” the incremental garbage collector, declaring it production ready for the upcoming dfx versions 0.15.3 or higher.
This is the next step before finally making the incremental GC the default GC in Motoko.
DFX Version
Using the incremental GC for production purposes is only recommended with the upcoming dfx versions 0.15.3 or higher. Please also consider the usage note and recommendations below.
To activate the incremental GC, the following command-line argument needs to be specified in dfx.json:
"type" : "motoko"
...
"args" : "--incremental-gc"
Usage Note
The incremental GC is designed to scale for large program heap sizes. While resolving scalability issues with regard to the instruction limit of the GC work, it is now possible to hit other scalability limits:
Out of memory: A program can run out of memory if it fills the entire memory space with live objects.
Upgrade limits: When using stable variables, the current mechanism of serialization and deserialization to and from stable memory can exceed the instruction limit or run out of memory.
Recommendations
Test the upgrade: Thoroughly test the upgrade mechanism for different data volumes and heap sizes and conservatively determine the amount of stable data that is supported when upgrading the program.
Monitor the heap size: Monitor the memory and heap size of the application in production.
Limit the heap size: Implement a custom limit in the application to keep the heap size and data volume below the scalability limit that has been determined during testing, in particular for the upgrade mechanism.
Avoid large allocations per message: Avoid large allocations of 100 MB or more per message, but rather distribute larger allocations across multiple messages. Large allocations per message extend the duration of the GC increment. Moreover, memory pressure may occur because the GC has a higher reclamation latency than a classical stop-the-world collector.
Consider a backup query function: Depending on the application case, it can be beneficial to offer a privileged query function to extract the critical canister state in several chunks. The runtime system maintains an extra memory reserve for query functions. Of course, such a function has to be implemented with a check that restricts it to authorized callers only. It is also important to test this function well.
Last resort if memory would be full: Assuming the memory is full with objects that have shortly become garbage before the memory space has been exhausted, the canister owner or controllers can call the system-level function __motoko_gc_trigger() multiple times to run extra GC increments and complete a GC run, for collecting the latest garbage in a full heap. Up to 100 calls of this function may be needed to complete a GC run in a 4GB memory space with fine-grained object structures. The GC keeps a specific memory reserve to be able to perform its work even if the application has exhausted the memory. Usually, this functionality is not needed in practice but is only useful in such exceptional cases.
My motivation was that this is a rather low-level “last-resort” administrative function that would be externally called by the canister controller/owner in the rare case when memory is full but garbage has been created shortly before the memory has been exhausted.
The incremental GC uses a partitioned heap with incremental snapshot-at-the-beginning marking and incremental evacuation-compaction based on Brooks forwarding pointers, by prioritizing partitions with most garbage space above a threshold (currently 15%). It is very similar to one of the Java’s most advanced GCs, the Shenandoah GC.
The administrative function __motoko_gc_trigger() is only foreseen for the rare case, where a canister has run out of memory and the program has created garbage shortly before memory was exhausted. Apart from this case, the default GC scheduling heuristics should be sufficient. Therefore, it is not necessary to proactively call this function, as it may even disturb the GC scheduling (i.e. starting new GC runs too eagerly and thus increasing the cycle costs).
we were using trie to store small fine-grained data, and after storing 700,000 records, the memory footprint was 1.5 G. Then the issue arose, continued to write data to the canister (no delete operations), about 100,000 records, and then the memory growth rate increased significantly, reaching the 4G limit in a few hours. During this process, it seems that the gc cannot catch up with the speed of data writing, resulting in a large amount of garbage buildup. Am I understanding this correctly?
If I use compacting GC, 800,000 records only takes up 1.7G of memory.
I’m testing this in moc 0.9.7, has it improved in version 0.10.3?
What is the best way to deal with this? Would it help if I write an empty update method and keep calling it?
I’m surprised that you didn’t hit issues with the compacting GC before 1.5GB
Is your heap size actually 4GB when you call Prim.rts_heap_size()? Or are you referring to your canister’s memory size Prim.rts_memory_size()?
If you’ve gotten to the point where the GC is blocked (too much garbage to collect over a few rounds of DTS consensus), then you’re in danger of your canister becoming unresponsive to additional insertions. This is where the incremental GC should shine (allowing you to use the full 4GB heap).
One additional area of improvement suggestion I might recommend. The TrieMap in motoko base isn’t nearly as memory efficient as some of the other libraries out there. Collection libraries | canister-profiling has statistics on a few of the currently available map libraries out there. Maybe look into a data migration that can save you space while you’re at it!
Indeed, the beta-version GC used to have quite long scheduling pauses in the memory range of 1 to 3 GB. This could explain the observed behavior.
We actually tuned the scheduling of the GC to be more conservative in the newer production-ready version. It is therefore recommendable to use moc version >= 0.10.3 or upcoming dfx >= 0.15.3. We also added a memory reserve such that GC should be able to do its work (evacuation compaction) even if memory is short.
As mentioned by @icme, it is good to observe the heap size, not only the memory size, as the latter may be larger. As usual, extensive testing of the application code with large data sizes is very much recommended to see at to which point the application still scales and the upgrades are still working.