Reason/Possibility for timer being canceled by replica?

skilesare · September 20, 2023, 5:56pm

I have a canister at nmiv5-haaaa-aaaam-abgaa-cai that runs the code at https://github.com/icdevs/eventually_reject/blob/main/src/main.mo.

I think I have it set up so that the timer will always be instantiated per round of governance checks OR a log will be added that says something failed.

You can view the log here: https://nmiv5-haaaa-aaaam-abgaa-cai.raw.ic0.app/

The canister has plenty of cycles but on or around 1694134837678781686 (Friday, September 8, 2023 1:00:37.678 AM), or within 8 hours after that, my timers stopped being called and no error was logged that would have otherwise alerted me to the fact that one of my async calls failed.

I’m curious if there may have been a replica event or if there may be some other reason why a timer would be canceled.

I’ve kicked the server and I’ll be able to find out soon if it will pick up processing as normal again. I should see a new log in about 8 hours or so.

cc @berestovskyy

berestovskyy · September 20, 2023, 9:37pm

Hey Austin,
As far as I can see from the code, in the checkForNewVotes():

We cancel the timer.
Put the Running checkForNewVotes.
We await gov.list_proposals
Then we run the process for loop.
Put the Setting next run.
And finally set the next timer.

As we see in the logs Running checkForNewVotes, but there is no Setting next run, it means that we successfully execute 1-3 (before the await), but then we trap somewhere in 4-6 (after the await).

While it could be that the setTimer traps, I’d bet that we trap somewhere in 4 process for loop…

Does it make sense?

skilesare · September 20, 2023, 9:49pm

Yes…that is what I expected, but the whole thing is wrapped in a try/catch so I would expect to see a log message.

berestovskyy · September 20, 2023, 10:24pm

I’m not even near a Motoko expert, but a quick googling shows shat Motoko is able to catch errors just from the async blocks.

So if the process for loop traps, we end up with a cancelled timer…

skilesare · September 29, 2023, 3:51pm

I restarted it and it failed again. I’m suspicious that it may be that the .did was updated with the new SNS single vote sale. When did that go live? (Looks like it was passed on Aug 28th…was it about ten days later that the first proposal with that type went out?

@lara

skilesare · September 29, 2023, 8:56pm

After some investigation, it looks like the following was the root of my problem:

The #CreateServiceNervousSystem : CreateServiceNervousSystem; variant was added to the Action type.
The ListProposalInfo type was changed and omit_large_fields and include_all)managed_neruon_proposals fields were added.

I’ve updated the canister and redeployed, but this seems like a set of changes that could have had much wider breaking changes, especially if we had broader DAO adoption. I’ll write up a separate post about it and link it here.

Topic		Replies	Views
Motoko Timer gets cancelled on canister upgrade Language Support Motoko	8	649	June 19, 2023
Stopped canisters and recurring timers (discussion) Language Support Motoko	10	902	April 27, 2023
Timer API redundancy is causing excessive burn rate Developers	10	240	December 4, 2023
Will Timer.setTimer be interrupted halfway? Language Support Motoko	2	557	April 3, 2023
Error: The Replica returned an error: code 5, message: "Canister trapped explicitly: could not perform self call" Developers	8	2368	August 25, 2022

Reason/Possibility for timer being canceled by replica?

Related topics