Heartbeat improvements / Timers [Community Consideration]

Am I reading this right that each canister would be able to set one single heartbeat at a time?


One follow-up thought, that might be a terrible idea ¯\_(ツ)_/¯

Many of the current cloud platforms have a variety of different services that are well suited to a specific use case or set of use cases. On AWS, scheduling lambda triggers revolves around using Cloudwatch events. Many applications will want to set up not just one-off triggers, but eventually use cron for monitoring and canaries.

Why does DFINITY think that the canister should be the one-size fits all for every type of computation on the Internet Computer? Why can’t specific types of canisters be built for specific purposes - especially since I’m of the opinion that larger applications on the IC will eventually need to be multi-canister and composed of horizontally scalable micro-services.

What if DFINITY built a new type of canister that would have one purpose - to schedule and execute inter-canister triggers/calls to other canisters? This canister could be of type “cron”, and would be much more “time-aware”, allowing any application to store all of their “cron-job” type of calls on a variety of different timers. This canister would be incompatible with what we currently think of as a canister (replica), but would be able to communicate and call the public APIs of other canisters via inter-canister calls.

This way, any deprecation fears where a canister may have been blackholed, like:

Are no longer a worry.

The old heartbeat API remains, but now we have a specific type of canister that is always wearing a watch and can act as a conductor for an orchestra of canisters in a multi-canister application.

This also makes it much easier for DFINITY to develop new features and avoid backwards compatibility, deciding the which group of features go into what we currently think of canisters as being used for (APIs & storage), vs. different types of canister “services” like event scheduling & monitoring. One can also imagine canisters built with the capability to injest higher ingress load and be used solely as queues, that the cron canisters can use to have our current canister type process and reduce data from the queues.

Thoughts?

4 Likes

I’d like a set_timeout(interval, metadata:nat64, (nat64) → async ()) → nat64;

This lets me schedule multple activities in a single round and track a piece of metadata that I can use to look up what I wanted to do during that particular call. Also, I get async functionality.

It would be nice to also be returned. Nat64 that I can use later to cancel_timeout if it is no longer needed.

1 Like

The reason option 1 only exposes a single time is to make the check for which canisters need to be scheduled extremely fast.

But by building libraries on top of this single timer you can have as many different functions execute at as many different times / intervals as you’d like.

The only thing that needs to be exposed to the replica is when the next job needs to run, then once that job has finished the time can be set to when the next job should run and so on.

3 Likes

I’d like to tag a few more people for their opinions:

@senior.joinu, because of his experience implementing ic-cron.
@raydeck, because of his experience implementing DeTi.

1 Like

I’m assuming this is the case as long as separate invocations don’t occur at the same time, or interfere with one another in terms of throwing off another “scheduled” time.

Since a canister is single threaded and “time” within a single message execution is constant, if the heartbeat trigger results in any significant amount of computation or awaits an asynchronous call, would this push back the canister A’s idea of what time it is?

For example, at t=20 heartbeat on canister A triggers function F that makes:

  • an asynchronous call to canister B, which takes 2 seconds to complete
  • and then an additional 0.5 is required to perform the remaining computation using the returned result from canister B within canister A.

What time does canister A think it is exactly after the heartbeat trigger (invoking function F) finishes?

For ic-cron there is actually no problem in the current design. This library already abstracts away timers and does almost everything it can in order to keep the cost as low as possible.

The only thing I’ve noticed is that some people want to have more control over the “precision/cost” trade-off. So, from this point of view, proposals #3 and #4 seem like the most logical solution for the ic-cron. If that gets implemented, then there is no change required for the users of ic-cron (and for the library itself); but those, who want more control, can have it.

If choosing between proposals #3 and #4, I believe proposal #3 is more flexible (and, btw, does not require you to implement new dfx commands).

This library already abstracts away timers and does almost everything it can in order to keep the cost as low as possible.

My understanding is with the replica’s current implementation of heartbeat, a canister’s exported heartbeat method will be called in every block, and that incurred cost cannot be avoided.

2 Likes

This does not depend on the library, so I wasn’t referring to this with the quote you’ve mentioned.

As far as I understand, heartbeat improvements proposed in this thread are addressing this issue.

Ideally the support for arbitrary many timers is implemented in the CDK allowing the System API to remain as simple as possible. As long as the different libraries use the same CDK, they shouldn’t interfere with each other. Perhaps, we shouldn’t expose the low-level canister_global_timer() to user code at all, so that only CDK can call it and user code must go through CDK.

After the async call to B, canister A cannot predict when function F finishes. That’s because scheduling of both canisters depends on how busy the subnet is.

In all proposed APIs the canister specifies the minimum time/duration before the time/heartbeat function invocation, but there is no way to specify the maximum because execution of a canister may be delayed if the subnet is busy. In other words, when the function is invoked, the canister can query the current time using ic0.time() and it is guaranteed that ic0.time() >= initially_scheduled_time.

5 Likes

Got it, makes perfect sense - so canisters still have no sense of absolute time, they just are aware of relative time since the previous call finished.

So in this example, canister A would still think it is t=20.

I guess HTTP requests might help with this, allowing a canister to update/sync their absolute sense of time with the outside world on a regular cadence (maybe also defined by heartbeat :wink:), combined with a relative sense of time within the canister.

I like #3 and #4. set the heartbeat interval within the canister or in the canister-settings.

Canister A can always get the time of current round/block by calling ic0.time(). To be concrete, in the following example t1 and t2 may be different because the part of F after the await block will be a separate message (a response message) which may execute in a subsequent round/block:

fn F(..) {
  t1 = ic0.time();
  await call_canister_B();
  t2 = ic0.time();
}
1 Like

Got it, makes sense this is very helpful!

Is calling ic0.time() an expensive operation in terms of cycles/latency? I’m not as familiar with this API.

You’re welcome! ic0.time() is one of the cheapest System APIs. I think it should cost less then 10 cycles currently.

1 Like

Thanks all for the feedback! It was very helpful. I went through the comments and counted:

  • 5 votes for proposal #1,
  • 2 votes for proposal #3 and #4.

Many engineers I talked with offline also expressed preference for proposal #1.

I think we can declare proposal #1 a clear winner. The next step would be to get a prototype implementation in both the replica and CDK with the goal of confirming our expectations: it should be easy to use, efficient, and should support arbitrary many timers.

19 Likes

Are there any updates on this?

We are currently busy with finishing the deterministic time slicing and are planning to start working on this in 2-3 weeks.

12 Likes

We’ve started to work over the timers! At the moment, we’re considering to make the timers awaitable just like calls, i.e. to be able to do something like:

#[update]
async fn handle_message(req: Request) -> Response {
  ic_cdk::sleep(Duration::from_sec(1)).await;
  Response::ok()
}

Putting the final design pieces together. Stay tuned!

5 Likes

Please loop in the motoko team. If we could await a timer it would be amazing.(but maybe a memory hog).

3 Likes

I’m thinking this might introduce the possibility of awaiting across upgrade boundaries.

My concern is with the following scenario:

#[update]
async fn handle_message(req: Request) -> Response {
    let response = …;
    // While waiting an upgrade happens
    ic_cdk::sleep(Duration::from_sec(…)).await;
    // response is now invalid due to a change to Response as part of the upgrade
    response
}```