Heartbeat improvements / Timers [Community Consideration]

Background

Heartbeat is a mechanism that allows canisters to perform periodic tasks without relying on incoming messages to trigger the execution. If a canister exports a Wasm method named canister_heartbeat, then the canister is considered as active even if it doesn’t have any incoming messages and the heartbeat method will be invoked in every round in which the canister is scheduled to run (the schedule depends on how busy the subnet is).

The problem

One drawback of the current design of heartbeat is that it is impossible to configure the frequency of heartbeat invocation. There is also no way to disable heartbeat without upgrading the canister. This “all or nothing” design wastes resources and cycles. See for example: Cycle burn rate heartbeat

This post summarizes solutions to this problem proposed by different people: @Alexandra, @berestovskyy, @christian, @Manu, @roman-kashitsyn, and others. The proposed function names are not final and are up to discussion/bikeshedding. The engineering cost is roughly similar for all options.

Proposal 1: Timers

A timer is “an automatic mechanism for activating a device at a preset time.” All major operating systems support some timer API. For example, Linux has getitimer()/setitimer(). The idea of this proposal is to support a minimal timer API that allows building arbitrarily complex and efficient timers on top of it.
The proposal:

  • Add a new System API ic0.set_global_timer(interval_in_seconds) function.
  • The function schedules a call to the exported canister_global_timer() Wasm method in some round after ic0.time() + interval_in_seconds.
  • The function returns the previous value of the timer relative to the current time. The getter can be implemented in a library by calling the setter twice because the time doesn’t change within the same execution.

Pros

  • Intuitive and known from other platforms.
  • A library can support multiple timers and heartbeats of arbitrary complexity by maintaining a queue.

Cons

  • A completely new API. If this proposal is accepted, then it might make sense to deprecate the existing heartbeat API.
  • Periodic timers and heartbeats should be implemented in a library. Direct usage of the API for such cases requires careful programming if the execution traps after scheduling the next timer. In such a case the timer would not be scheduled. The library can easily work around this by calling the user code in a separate message.

Proposal 2: Pause heartbeat using absolute time

  • Add a new System API ic0.pause_heartbeat_until(absolute_time) function.
  • The function prevents heartbeat invocations until ic0.time() >= absolute_time.
  • The function returns the previously set time, so there is no need for a separate getter.
  • The function can be called in any context (not only in heartbeats).
  • If the heartbeat doesn’t call the function again to postpone the next heartbeat, then it falls back to the current behavior (calling the heartbeat every round).

Pros

  • A library can support multiple timers and heartbeats of arbitrary complexity by maintaining a queue.

Cons

  • Requires a call to ic0.time() to compute the absolute time.
  • Requires careful programming to properly handle code that traps. Ideally that is wrapped in a library similar to Proposal 1.

Proposal 3: Set heartbeat interval

  • Add a new System API ic0.set_heartbeat_interval(duration_in_seconds) function.
  • The function sets the minimum interval between two heartbeat invocations.
  • The function returns the previously set value, so there is no need for a separate getter.
  • The function can be called in any context (not only in heartbeats).

Pros

  • The interval needs to be set once and no action is required from the heartbeat to keep it.

Cons

  • A bit more difficult to implement a generic library for arbitrary timers because the API is tailored for heartbeats.

Proposal 4: Set heartbeat interval in canister settings

  • Add a new heartbeat_interval_in_seconds field in the canister settings.
  • The controller of the canister can change the field by calling the update_settings method of the IC management canister.

Pros

  • No new System API
  • The controller can adjust the interval without any code changes

Cons

  • Difficult to support general programmatic timers because only controllers can update the field and update requires an inter-canister call. This might be okay if only coarse grained timing is required (e.g. order of 5 - 10 seconds).

A similar but less flexible idea would be to specify the heartbeat interval in a custom section of Wasm module.

30 Likes

Thanks for the nice writeup @ulan!

@lastmjs and @bob11, i know you are interested in this topic, so I’m curious to hear your thoughts!

5 Likes

Adding more folks who might be interested in this topic: @bjoern, @claudio, @domwoe, @ggreif, @hpeebles

3 Likes

I think proposal 1 makes most sense. Thanks!

1 Like

Yessssssss! We could really do with this within OpenChat!

We have tens of thousands of user canisters so we can’t use heartbeat in them since queueing all of those canisters on every round would be insanely expensive + would take execution time away from other canisters. But we have lots of cases where we need to be able to trigger tasks to run in the future, for example retrying an inter canister update call if it failed the first time or marking a poll as ended.

To solve these issues we have built a ‘callback canister’ which we use to trigger callbacks at chosen times in the future. With this functionality we could simplify things massively and remove this canister entirely.

In terms of implementation, I like option 1, it seems like it would fit all use cases whereas the other options would be an improvement but would still not handle many use cases.

Edit: Actually option 2 does provide all of the functionality provided by option 1 but I prefer the API of option 1.

5 Likes

Did you consider the simplest option, namely examining the type of canister_heartbeat and if it returns an (u32), interpret that as rounds that should pass before being called again? While at observing the function’s type, from here it should be trivial to interpret returning a (f32) as the seconds to pass (absolute timespan). This solution is still backwards compatible, just check for a () return.

4 Likes

I definitely vote for Proposal 1 as is it the most clean and powerful way of dealing with schedules. I’d rather prefer to go through the pain of deprecation than building upon or extending a problematic solution.

Btw, don’t fully understand this paragraph:

The function returns the previous value of the timer relative to the current time. The getter can be implemented in a library by calling the setter twice because the time doesn’t change within the same execution.

5 Likes

Rounds have no relation with the wall time, their frequency can change any time and AFAIK they aren’t visible in the programming model (and I think this is great).

Yes, we considered this idea of heartbeat specifying the time of the next invocation. The main blocker was that it is impossible to change the scheduled time outside of the heartbeat. The use case: the heartbeat schedules the next call in 1 day, then some message arrives and the canister needs to do work in 10 minutes.

The idea was that CDK can implement a getter function like this:

fn get_global_timer() -> Duration {
   let seconds = ic0.set_global_timer(0);
   ic0.set_global_timer(seconds);
   Duration::from_secs(seconds)
}

So there is no need for the System API to support getter. But that’s a minor pont and we might want to add the getter for symmetry.

1 Like

I want to see:
1)Resetting the system instruction counter in long-running operations (in the function body).
2)System notification of a new round. It is the system time, since the time in nanoseconds is always different anyway.

It is even possible to leave a simple heartbeat and add a new (similar) instruction.
Thanks!

That sounds like a rather theoretical obstacle. Why not do the cleanup immediately (or with a small delay of 1 round for a self send)? Did you actually encounter a real use case for a message needing (scheduling) a heartbeat whose delay depends on an incoming message?

Yes, I know, but that u32 is just a hint. 1 would mean I want to be called as soon as possible, 2 means twice that, etc.

I think the use case is real rather than theoretical because:

  1. A canister might want to disable heartbeat (to save cycles) until some event occurs that activates it again. Disabling can be done by setting a large delay like 1 month.
  2. A canister might have multiple different jobs with different intervals and activations.
  3. In general, we want to enable support for arbitrary many timers and heartbeats as a library.
5 Likes

Thank you ulan!

In general I like the first proposal because its intuitive and expressive, but wonder about the following:

Would deprecate mean eventually removing the heartbeat API or just telling people that this functionality is deprecated? There might be non-upgradable canisters relying on this API…

2 Likes

If you go the timer route you have to think about the payment mode. Each timer becomes a replica-side resource, and thus need bookkeeping (a queue) there. Canisters might DoS that facility.

Deterministic-time slicing will help to support longer running operations without resetting the instruction counter: Deterministic Time Slicing

Implementing a heartbeat/timer like functionality based on long-running operations is not feasible for two reasons: 1) the operation runs in the context of the original round, so ic0.time() will return the same value (we cannot change that) 2) the operation uses CPU cycles while it is running so it is going to be much more expensive compared to timers/hearbeats.

The idea is that we have one global timer per canister. A library within the canister then supports arbitrary many timers built on top of that global timer with queues in the Wasm memory.

One the replica side, we actually wouldn’t have any real timers. The implementation would be very similar to the existing heartbeat implementation and an additional check of the current block time and the scheduled time of the canister. (The implementation can be optimized further if we see that it becomes an issue.)

2 Likes

How exactly we deprecate APIs is an open question. Different people have different opinions here. So far we’ve been following the approach of telling but not removing, but that might change in the future.

I generally feel similarly to most others, option 1 is seeming like the best option.

I do have a concern though, and that is the fact that the current heartbeat functionality and option 1 only allow 1 exported canister method to be called on an interval, in the case of option 1 just canister_global_timer(). I would like to be able to hook up arbitrarily many callback functions and arbitrarily many timers, similar to the http streaming capabilities in the asset canister.

One use case I have in mind for this is cycle sharing for open source libraries. The idea is that when a user installs an OS library into their canister, that OS library could hook up a new “heartbeat” method that is designed to send a few cycles back to the author of the OS library on a regular basis, perhaps once per day, week, or month. The problem with the current heartbeat implementation is 1. that it is very expensive and 2. that you can only export one heartbeat function, making it difficult for other libraries to generically add functionality to a canister. The developer would have to explicitly/manually update the one exported heartbeat function to allow the cycle sharing. We want it to be transparent to the developer’s source code entirely, and only configured at the package.json, dfx.json etc level.

3 Likes