How can I debug on mainnet? I have an issue with heartbeat

I’m running into an issue with heartbeat that does not occur locally. Is it possible by now to log values on mainnet?

The issue with heartbeat is that a function is executed fully several times which should not occur from what I understand in this particular case.

In that function, I call several other canisters. I await all of these calls. At the end of the function, I call my, let’s say, db canister to modify its state. That particular state change should lead the function to return right at the beginning if called again through heartbeat.

From what I understood, awaiting all these external function calls should make it impossible for the function to be called again until all of these calls have returned a value. If that’s correct then the state change should prevent the rest of the function to be executed again (triggering all the calls again). Locally this works fine as I think the heartbeat intervals are slower, correct?

In case I’m obviously misunderstanding something about async await let me know. And it would be really helpful to know how to debug on mainnet generally.

Thanks in advance.

No, still not possible. But I like to append messages to a list of strings that I can later go query from the canister if I need some quick-and-dirty logging on mainnet myself.

No, while your canister is awaiting it has time to run code that is associated with a different update/heartbeat call.

I would expect the local environment to be about the same, maybe a miniscule bit faster because there can be no delay from consensus. BUT since you’re running everything on the same replica, it’s likely that your entire heartbeat function is able to complete before the next heartbeat is triggered. If you await e.g. a canister on a different subnet, this will cause a next heartbeat to get triggered before you get a response.

If you don’t want heartbeats to interleave, you can put a flag in that you set to true while you’re processing a heartbeat and set it back to false when you’re done (careful, this can get stuck on true if you’re not careful about error handling). If you return if the flag is true first thing in your heartbeat then you won’t get overlapping runs of the function.

1 Like

Thank you very much that is extremely helpful.

It makes a lot of sense as I’m interacting with the ledger canister, which of course is on a different subnet on mainnet and on the same locally. I’ve already started to use a flag in the meantime, which seems to work, thanks.

What confused me is that I thought canisters generally can’t do any other work if they await the calls to other canisters until they are processed. At least the functions I call on my canister are always processed after the previous calls finish. I think even if I don’t await the calls, correct?

Is heartbeat fundamentally different in that regard or am I misunderstanding the entire async await/actor concept? Do functions of a canister ever execute while some other function on the same canister has an outstanding call to an external canister?

In this regard, heartbeat is an async function just like every other one. I think this is a misunderstanding of await semantics. Your actor/canister/wasm execution environment does not do parallel processing (at least for update calls, queries may run in parallel on separate nodes), but calls can be multi-threaded through the use of awaits.

Yes, I actually use this in the cycles faucet. I have a function that creates a new wallet to hand out, and I create those in advance. Because of these await semantics, I can perform dozens of transfers on the ledger at the same time.

If you use await, then you surrender control and the next part will execute at some future point in time. While waiting, other calls may get processed. If you only stay in synchronous execution (no awaits) then this won’t happen for sure. If you await a function on the same canister, it just happens to most of the time instantly process the awaited function, but that is not guaranteed.

Ahh I understand, that is more or less how I thought it works initially but then “progressed” in the wrong direction it seems ;). Thank you very much this fully answers my question.

I have to finally write that down in a proper way…

1 Like