Canister burning through cycles

Yeah, I looked at the code for [Nat8] and found just one, probably harmless one.

I strongly suspect the aggressive heartbeat creating lots of call context is a big culprit and also preventing upgrades by always opening up new call contexts and causing callbacks to be stored on the heap, as suggested above.

Can you just do nothing on most heartbeats bar every nth one and provide a method to stop the heartbeat before an upgrade (if there isn’t one already).

Some of the array appends could be replace by more efficient Array.map over a source array, avoiding the quadratic behaviour. Others can be made more efficient by using a Buffer.

I believe TrieMaps are an almost drop-in replacement for the HashMaps that scale better and don’t have the rehashing cost of HashMaps. Others have reported big improvements after switch from HashMaps to TrieMaps.

dfx 0.10.0 shipped with Motoko 0.6.26, which doesn’t have the streaming implementation of stable variable serialization.

dfx 0.10.1 shipped with Motoko 0.6.28, which does have it (and fixes a bug in TrieMap that might affect you if you swap HashMap for TrieMap).

It might be worth upgrading to dfx 0.10.1 if possible.

5 Likes

Ok. So it would just be the type that is defined in the code that matters?

Yes, the choice of Blob or [Nat8] in the Motoko code determines how the Candid vec nat8 is imported. But I guess that isn’t actually the issue here.

1 Like

Yes, that’s exactly what I had hoped to test out this evening. I will also follow your other suggestions.

Again, thank you @claudio @domwoe and @PaulLiu. You’ve all been very helpful and i’ve learned a lot. I will follow-up with the results of my changes soon.

One more thing:

Your heartbeat issues some awaits:

system func heartbeat() : async () {
    if (_pushLog == true){
      canistergeekLogger.logMessage("heartbeat");
    };
    if (_runHeartbeat == true){
      await cronDisbursements();
      await cronSettlements();
      await cronCapEvents();
    };
  };

Depending on you semantics, you might be able to just issue the calls and not await the results:

system func heartbeat() : async () {
    if (_pushLog == true){
      canistergeekLogger.logMessage("heartbeat");
    };
    if (_runHeartbeat == true){
      ignore cronDisbursements();
      ignore cronSettlements();
      ignore cronCapEvents();
    };
  };

(Or even try declaring the cronXXX functions as oneway/fire-forget functions that return (), not async () so you can drop the ignores)

2 Likes

+1

It may be worth trying TrieMap in place of HashMap, which does not seem to suffer from the same issues. It’s (essentially) the underlying data structure used to save projects on Motoko Playground.

In the future, HashMap will avoid this rehashing (as Trie and TrieMap already do), but not in the very short term. Even when that happens, it still will have a worst-case linear insertion time (when the underlying array has to grow). So, depending on the use case, it may be better to avoid HashMap even in the long term, too.

2 Likes

Any updates on this?

1 Like

we are still running into issues with large cycles consumption, lightning lad is going to post some information and charts later

1 Like

Thanks @jonit i was about to say the same. I’m at work currently and don’t have access to my laptop.

@domwoe thank you for following up. I will provide a detailed response this evening (it’s early morning for me now)

1 Like

@domwoe apologies for the delay. I wanted to try one more thing before posting. Here is a summary of the changes that were made.

Updated code here: PetBots/main.mo at main · lightninglad91/PetBots · GitHub

Changes Made::
Note: Each change (except for #3) was followed by a series of DFX commands to:

stop the canister (always timed out) > freeze the canister (raising the freezing_threshold) > thaw the canister > deploy changes > start canister

Numbered changes correspond to the numbers in the “Heap Memory” chart below.

  1. Heartbeat functions now execute on every 8th round

  2. Replaced all instances of HashMap with TrieMap

  3. No code change / no deployment - just a freeze and restart to see how heap memory reacted.

  4. Reduced the size of each JPEG asset by ~50%.

Pending Changes::

  • Replacing Array types with Buffer

Here is a chart that shows the number of update calls that were being made before and after each change.

And finally here is a chart that shows the cycle burn rate during the past week. I’ve tried to circle a few spots that align with periods of time that the heap memory was showing steady linear growth rather than modulating between 1.2GB and 1.8GB. During these steady times the burn rate was almost negligible.

Edit: The chart below reflects the total cycles balance of our canister over a week’s time. The spikes in the balance are our attempts to keep the canister from running out of cycles.

I did notice that, based on my logged events, the heap memory would start modulating right around the time that a settle call was made.

5 Likes

Doing a lot of people a lot of favors with sharing this data, thank you! :infinity:

4 Likes

Thanks a lot @LightningLad91 !

What’s shown in the Cycles graph exactly?
There seems to be no strong correlation with neither heap size nor the number of update calls?!

The only correlation I’ve found is that after I go through the steps to deploy the canister the heap memory would go back down to 1.2GB and then steadily increase for a period of time.

During that period of time the canister was burning cycles at the expected rate of .5T per day. That is why the chart shows a flat line at those times.

Perhaps the chart is not useful. I was just trying to show that the canister is not burning 2T cycles/hour right after being deployed. It takes a period of time before it starts to burn that fast.

FWIW, opened a PR to fix this issue here, now:

1 Like

So what is the unit in the cycles graph? cycles/hour? cycles/day?

The graph depicts the total cycles balance of the canister over the past week (6/15 - 6/21). Not the rate of burn. Apologies for not making that clear. The spikes you see are our attempts to keep the canister alive by topping it up. I will update my post to clarify this.

The CanisterGeek method appears to collect the data every 5 minutes.

1 Like

The diff is https://patch-diff.githubusercontent.com/raw/dfinity/motoko-base/pull/394.diff

Ah, got it :slight_smile:

Do you know how big the transactions array is?

1 Like

Looking forward to the graphs as soon as you do that :slight_smile:

2 Likes

@domwoe i will work on collecting this data and making that change. Will probably have a response around the same time tomorrow. Thanks!

2 Likes

+1

Related: New naming convention to avoid performance pitfalls. · Issue #395 · dfinity/motoko-base · GitHub

I did not link back to this forum conversation. But I had it in mind when I wrote that issue just now. Unfortunately, it’s not the only example out in the wild. I think the naming proposal will help future devs.

But what do others think? Please comment, either here (or ideally) in Github.

2 Likes