Reason/Possibility for timer being canceled by replica?

I have a canister at nmiv5-haaaa-aaaam-abgaa-cai that runs the code at https://github.com/icdevs/eventually_reject/blob/main/src/main.mo.

I think I have it set up so that the timer will always be instantiated per round of governance checks OR a log will be added that says something failed.

You can view the log here: https://nmiv5-haaaa-aaaam-abgaa-cai.raw.ic0.app/

The canister has plenty of cycles but on or around 1694134837678781686 (Friday, September 8, 2023 1:00:37.678 AM), or within 8 hours after that, my timers stopped being called and no error was logged that would have otherwise alerted me to the fact that one of my async calls failed.

I’m curious if there may have been a replica event or if there may be some other reason why a timer would be canceled.

I’ve kicked the server and I’ll be able to find out soon if it will pick up processing as normal again. I should see a new log in about 8 hours or so.

cc @berestovskyy

1 Like

Hey Austin,
As far as I can see from the code, in the checkForNewVotes():

  1. We cancel the timer.
  2. Put the Running checkForNewVotes.
  3. We await gov.list_proposals
  4. Then we run the process for loop.
  5. Put the Setting next run.
  6. And finally set the next timer.

As we see in the logs Running checkForNewVotes, but there is no Setting next run, it means that we successfully execute 1-3 (before the await), but then we trap somewhere in 4-6 (after the await).

While it could be that the setTimer traps, I’d bet that we trap somewhere in 4 process for loop…

Does it make sense?

Yes…that is what I expected, but the whole thing is wrapped in a try/catch so I would expect to see a log message.

I’m not even near a Motoko expert, but a quick googling shows shat Motoko is able to catch errors just from the async blocks.

So if the process for loop traps, we end up with a cancelled timer…

I restarted it and it failed again. I’m suspicious that it may be that the .did was updated with the new SNS single vote sale. When did that go live? (Looks like it was passed on Aug 28th…was it about ten days later that the first proposal with that type went out?

@lara

After some investigation, it looks like the following was the root of my problem:

  1. The #CreateServiceNervousSystem : CreateServiceNervousSystem; variant was added to the Action type.
  2. The ListProposalInfo type was changed and omit_large_fields and include_all)managed_neruon_proposals fields were added.

I’ve updated the canister and redeployed, but this seems like a set of changes that could have had much wider breaking changes, especially if we had broader DAO adoption. I’ll write up a separate post about it and link it here.

3 Likes