Canister burning through cycles

claudio · June 15, 2022, 9:22pm

Yeah, I looked at the code for [Nat8] and found just one, probably harmless one.

I strongly suspect the aggressive heartbeat creating lots of call context is a big culprit and also preventing upgrades by always opening up new call contexts and causing callbacks to be stored on the heap, as suggested above.

Can you just do nothing on most heartbeats bar every nth one and provide a method to stop the heartbeat before an upgrade (if there isn’t one already).

Some of the array appends could be replace by more efficient Array.map over a source array, avoiding the quadratic behaviour. Others can be made more efficient by using a Buffer.

I believe TrieMaps are an almost drop-in replacement for the HashMaps that scale better and don’t have the rehashing cost of HashMaps. Others have reported big improvements after switch from HashMaps to TrieMaps.

dfx 0.10.0 shipped with Motoko 0.6.26, which doesn’t have the streaming implementation of stable variable serialization.

dfx 0.10.1 shipped with Motoko 0.6.28, which does have it (and fixes a bug in TrieMap that might affect you if you swap HashMap for TrieMap).

It might be worth upgrading to dfx 0.10.1 if possible.

claudio · June 15, 2022, 9:23pm

Ok. So it would just be the type that is defined in the code that matters?

Yes, the choice of Blob or [Nat8] in the Motoko code determines how the Candid vec nat8 is imported. But I guess that isn’t actually the issue here.

LightningLad91 · June 15, 2022, 9:26pm

Yes, that’s exactly what I had hoped to test out this evening. I will also follow your other suggestions.

Again, thank you @claudio @domwoe and @PaulLiu. You’ve all been very helpful and i’ve learned a lot. I will follow-up with the results of my changes soon.

claudio · June 15, 2022, 10:06pm

One more thing:

Your heartbeat issues some awaits:

system func heartbeat() : async () {
    if (_pushLog == true){
      canistergeekLogger.logMessage("heartbeat");
    };
    if (_runHeartbeat == true){
      await cronDisbursements();
      await cronSettlements();
      await cronCapEvents();
    };
  };

Depending on you semantics, you might be able to just issue the calls and not await the results:

system func heartbeat() : async () {
    if (_pushLog == true){
      canistergeekLogger.logMessage("heartbeat");
    };
    if (_runHeartbeat == true){
      ignore cronDisbursements();
      ignore cronSettlements();
      ignore cronCapEvents();
    };
  };

(Or even try declaring the cronXXX functions as oneway/fire-forget functions that return (), not async () so you can drop the ignores)

matthewhammer · June 16, 2022, 3:02am

+1

It may be worth trying TrieMap in place of HashMap, which does not seem to suffer from the same issues. It’s (essentially) the underlying data structure used to save projects on Motoko Playground.

In the future, HashMap will avoid this rehashing (as Trie and TrieMap already do), but not in the very short term. Even when that happens, it still will have a worst-case linear insertion time (when the underlying array has to grow). So, depending on the use case, it may be better to avoid HashMap even in the long term, too.

domwoe · June 20, 2022, 11:01am

Any updates on this?

jonit · June 20, 2022, 11:19am

we are still running into issues with large cycles consumption, lightning lad is going to post some information and charts later

LightningLad91 · June 20, 2022, 11:22am

Thanks @jonit i was about to say the same. I’m at work currently and don’t have access to my laptop.

@domwoe thank you for following up. I will provide a detailed response this evening (it’s early morning for me now)

LightningLad91 · June 21, 2022, 1:03pm

@domwoe apologies for the delay. I wanted to try one more thing before posting. Here is a summary of the changes that were made.

Updated code here: PetBots/main.mo at main · lightninglad91/PetBots · GitHub

Changes Made::
Note: Each change (except for #3) was followed by a series of DFX commands to:

stop the canister (always timed out) > freeze the canister (raising the freezing_threshold) > thaw the canister > deploy changes > start canister

Numbered changes correspond to the numbers in the “Heap Memory” chart below.

Heartbeat functions now execute on every 8th round
Replaced all instances of HashMap with TrieMap
No code change / no deployment - just a freeze and restart to see how heap memory reacted.
Reduced the size of each JPEG asset by ~50%.

Pending Changes::

Replacing Array types with Buffer

Here is a chart that shows the number of update calls that were being made before and after each change.

And finally here is a chart that shows the cycle burn rate during the past week. I’ve tried to circle a few spots that align with periods of time that the heap memory was showing steady linear growth rather than modulating between 1.2GB and 1.8GB. During these steady times the burn rate was almost negligible.

Edit: The chart below reflects the total cycles balance of our canister over a week’s time. The spikes in the balance are our attempts to keep the canister from running out of cycles.

I did notice that, based on my logged events, the heap memory would start modulating right around the time that a settle call was made.

GLdev · June 21, 2022, 1:18pm

Doing a lot of people a lot of favors with sharing this data, thank you!

domwoe · June 21, 2022, 2:06pm

Thanks a lot @LightningLad91 !

What’s shown in the Cycles graph exactly?
There seems to be no strong correlation with neither heap size nor the number of update calls?!

LightningLad91 · June 21, 2022, 2:14pm

The only correlation I’ve found is that after I go through the steps to deploy the canister the heap memory would go back down to 1.2GB and then steadily increase for a period of time.

During that period of time the canister was burning cycles at the expected rate of .5T per day. That is why the chart shows a flat line at those times.

Perhaps the chart is not useful. I was just trying to show that the canister is not burning 2T cycles/hour right after being deployed. It takes a period of time before it starts to burn that fast.

matthewhammer · June 21, 2022, 2:14pm

FWIW, opened a PR to fix this issue here, now:

github.com/dfinity/motoko-base

HashMap performance fix.

dfinity:master ← dfinity:hashmap-improvements

opened 02:01PM - 21 Jun 22 UTC

matthewhammer

+14 -7

Avoids rehashing when the hash table grows. Fixes one known performance issue. … #### Background There are several performance issues with the `HashMap` that Motoko devs experience when they insert many keys and the table's underlying array re-grows. Like a usual hash table, each growth step adds an `O(n)` overhead to what would otherwise be an `O(1)` insertion operation. But what's worse here, the current `HashMap` (before this PR) actually re-runs the hash function on every key. For `Text` keys with even moderate sizes, this has been shown to be prohibitive even at moderate map sizes. The issue is critical, as it manifests as a failure to insert more keys, as regrowth goes beyond the current message cycle limit. Even with this limit relaxed or removed in the future, it would ideal to avoid this rehashing step. This PR does that.

domwoe · June 21, 2022, 2:17pm

So what is the unit in the cycles graph? cycles/hour? cycles/day?

LightningLad91 · June 21, 2022, 2:23pm

The graph depicts the total cycles balance of the canister over the past week (6/15 - 6/21). Not the rate of burn. Apologies for not making that clear. The spikes you see are our attempts to keep the canister alive by topping it up. I will update my post to clarify this.

The CanisterGeek method appears to collect the data every 5 minutes.

ggreif · June 21, 2022, 2:28pm

The diff is https://patch-diff.githubusercontent.com/raw/dfinity/motoko-base/pull/394.diff

domwoe · June 21, 2022, 3:37pm

Ah, got it

Do you know how big the transactions array is?

domwoe · June 21, 2022, 3:41pm

Looking forward to the graphs as soon as you do that

LightningLad91 · June 21, 2022, 3:59pm

@domwoe i will work on collecting this data and making that change. Will probably have a response around the same time tomorrow. Thanks!

matthewhammer · June 21, 2022, 4:26pm

+1

Related: New naming convention to avoid performance pitfalls. · Issue #395 · dfinity/motoko-base · GitHub

I did not link back to this forum conversation. But I had it in mind when I wrote that issue just now. Unfortunately, it’s not the only example out in the wild. I think the naming proposal will help future devs.

But what do others think? Please comment, either here (or ideally) in Github.

Topic		Replies	Views
Why does a canister keep consuming cycles? Developers community-consideration	19	2019	March 11, 2022
Cycles being drained. How to understand what's behind it? Programs & Applications	5	628	November 1, 2022
Canister Cycle Limits and Burning Cycles 😬? Developers	7	1579	May 16, 2021
NFTAnvil IC network tests report Developers	13	1571	February 2, 2023
Cycle burn rate heartbeat Developers	45	4045	November 8, 2022

Canister burning through cycles

Related topics