Canister Lifecycle Hooks

The motion proposal has been accepted. The feature is in the roadmap. We will start working on it after Proposal: Configurable Wasm Heap Limit.

1 Like

I think a hook that is triggered upon unfreezing a canister would be really helpful as well.

Imagine a canister that upon deployment starts calling itself indefinitely. At some point the cycle balance falls below the freezing threshold and the self call loop will be interrupted. The canister is topped up again and unfrozen, but now the self call loop has to be triggered manually. Ideally I can just specify a lifecycle hook for the event that a canister is unfrozen :slight_smile:

Do you have an example of this behavior? I believe it was fixed a while back, but if it hasn’t, let’s investigate further…

Last time I used this to stop a canister that was trapped in a self-loop so I can upgrade it was in July last year. Maybe things have changed since then?
To clarify, I’m not suggesting we change the behaviour, I’d just be happy to have a lifecycle hook for a canister that becomes unfrozen.

I could reproduce the behaviour locally with this Motoko Canister

actor {
  var counter = 0;

  public query func getCounter() : async Nat {
    counter;
  };

  public func incrementCounter() : async () {
    counter += 1;
  };

  public func selfCallLoop() : async () {

    while true {
      await incrementCounter();
    };
  };
};
  1. deploy the canister
  2. call selfCallLoop
  3. call getCounter repeatedly to verify messages are being executed continiously
  4. increase freezing threshold such that canister is frozen
  5. decrease freezing treshold such that canister is unfrozen
  6. repeatedly call getCounter to verify counter is not being incremented anymore and therefore execution of selfCallLoop came to a halt from freezing the canister

Ideally I would be able to specify a lifecycle hook that is triggered as soon as the canister is unfrozen so I can make another call to selfCallLoop and continue my work of incrementing the counter.

1 Like

I’m not a Motoko expert, but looking at the example from the IC specification perspective, it seems to be missing error handling.

Following the Motoko documentation on Asynchronous errors, here how the error handling in this loop could look:

while true {
    try {
        await incrementCounter();
    } catch (e) {
        switch Error.code(e) {
            case #system_transient {
                // Here goes the lifecycle logic.
            }
        }
    }
}

There might be other await errors beyond the ā€œunfrozenā€ hook would cover. Here’s a scenario illustrating this:

  1. The await incrementCounter(); traps because the canister queue is full.
  2. The ā€œunfrozenā€ lifecycle hook will never be called because the canister is full of cycles.
  3. The selfCallLoop must still be manually triggered…
1 Like

That’s true, it misses error handling. I was just trying to give a quick example to better illustrate the use case for this hook :innocent:
But even with the error handling, once the canister broke out of the loop I have to manually trigger it. In Motoko the code above won’t fill up the canister queue as every self call is a separate message.

once the canister broke out of the loop I have to manually trigger it

Maybe, setting a timer in error handler could be used to ā€œtry again laterā€?

the code above won’t fill up the canister queue as every self call is a separate message

I agree, but in general there are 9 system transient errors. And these should all be handled the same way - by retrying them later…

What is a ā€œtransientā€ error?

Maybe, setting a timer in error handler could be used to ā€œtry again laterā€?

I don’t think there is an error for a frozen canister thrown :thinking:
This code does not show anything when calling dfx canister logs after repeating the steps mentioned in the previous message. Also timers stop executing and don’t resume if a canister is below the freezing threshold.

import Error "mo:base/Error";
import Debug "mo:base/Debug";
import Timer "mo:base/Timer";

actor {
  var counter = 0;

  func printCounter() : async () {
    Debug.print(debug_show (counter));
  };

  ignore Timer.recurringTimer<system>(#seconds 2, printCounter);

  public query func getCounter() : async Nat {
    counter;
  };

  public func incrementCounter() : async () {
    counter += 1;
  };

  public func selfCallLoop() : async () {

    while true {
      try {
        await incrementCounter();
      } catch (e) {
        Debug.print("this was caught: " # debug_show (Error.message(e)));
      };
    };
  };
};

So without the hook, how would that work?

What is a ā€œtransientā€ error?

Here is the spec definition, and here is how it’s defined in the code.

For the rest, I’m not an expert in Motoko, maybe @ggreif or @claudio could have a look?

The similar Rust example would look like:

thread_local! {
    static COUNTER: std::cell::RefCell<u64> = 0.into();
}

#[ic_cdk::query]
fn get_counter() -> u64 {
    COUNTER.with_borrow(|c| c.clone())
}

#[ic_cdk::update]
fn inc_counter() {
    COUNTER.with_borrow_mut(|c| *c += 1)
}

#[ic_cdk::update]
fn self_call_loop() {
    ic_cdk_timers::set_timer_interval(std::time::Duration::from_secs(1), inc_counter);
}

And it survives the freezing/unfreezing just fine without going into the reject code details, as those are handled by the timers library.

1 Like

Very cool, I didn’t know Rust and Motoko behaved that differently regarding timers. This is what I wanted.

use ic_cdk::{api::call, println};

thread_local! {
    static COUNTER: std::cell::RefCell<u64> = 0.into();
    static IS_CALLING_ITSELF  : std::cell::RefCell<bool> = false.into();
}

#[ic_cdk::query]
fn get_counter() -> u64 {
    COUNTER.with_borrow(|c| *c)
}

fn print_counter() {
    println!("Counter: {}", get_counter());
}

#[ic_cdk::init]
fn init() {
    ic_cdk_timers::set_timer_interval(std::time::Duration::from_secs(2), print_counter);
    ic_cdk_timers::set_timer_interval(std::time::Duration::from_secs(5), || {
        ic_cdk::spawn(async { self_call_loop().await })
    });
}

#[ic_cdk::update]
fn inc_counter() {
    COUNTER.with_borrow_mut(|c| *c += 1)
}

#[ic_cdk::update]
async fn self_call_loop() {
    if !IS_CALLING_ITSELF.with_borrow(|c| *c) {
        println!("Calling itself");
        IS_CALLING_ITSELF.with_borrow_mut(|c| *c = true);
        loop {
            match call::call::<(), ()>(ic_cdk::id(), "inc_counter", ()).await {
                Ok(_) => {}
                Err(e) => {
                    println!("This was caught: {:?}", e);
                    IS_CALLING_ITSELF.with_borrow_mut(|c| *c = false);
                    break;
                }
            }
        }
    } else {
        println!("Already calling itself");
    }
}

I assume the break statement in the loop is important because otherwise the canister will still try to call inc_counter indefinitely, depleting it’s cycles?

Also, do the failed canister_global_timer still costs cycles?

I found this in the interface specification, how does the Rust timers implementation work around that :thinking:

The global timer is also deactivated upon changes to the canister’s Wasm module (calling install_code , install_chunked_code , uninstall_code methods of the management canister or if the canister runs out of cycles).

I assume the break statement in the loop is important because otherwise the canister will still try to call inc_counter indefinitely, depleting it’s cycles?

I have a feeling that it’s a XY problem. What do you think about starting a new topic describing the problem we’re trying to solve?

Also, do the failed canister_global_timer still costs cycles?

Any message execution cost cycles.

how does the Rust timers implementation work around that

There is no workaround in Rust timers library, and the timer will be deactivated on the listed cases. Some workarounds could be implemented on the app level though.

Clarifying question about the current implementation of Canister Lifecycle Hooks.

Are they one-time events, or can they be fired multiple times?

Take this example of the heap memory lifecycle hook:

  1. My canister crosses the heap memory threshold → hook is fired
  2. Garbage collection runs (passive) freeing up heap memory → canister is back under the heap memory threshold
  3. Canister writes more data and crosses the heap threshold again → is the hook fired again?

Similarly, this same question applies to the low cycles balance lifecycle hook.

Say I top up a canister with a micro cycles payment every time it falls below the cycles threshold. Is there a maximum # of times that a lifecycle hook can be fired within a given time period (checkpoint)?

In the example you provide, yes, the hooks will be fired multiple times as the condition was hit multiple times. However, if the condition is hit and nothing changes (i.e. your canister remains at high memory usage or low in cycles), the hook will not trigger again.

1 Like

Wanted to circle back here to see if the upcoming canister_on_error lifecycle hook will expose an API for viewing the canister backtrace associated with a trap.

1 Like

We can consider exposing this. The canister_on_error hook is anyway not under development at the moment so we will most likely have canister backtraces before we start working on canister_on_error (and thus before we need to decide what the canister_on_error has access to).

1 Like

The low_wasm_memory hook is now available.

This lifecycle hook lets you define a function that is executed when a canister’s available wasm memory drops below a user-defined threshold. This can be used for proactive memory management to help prevent out-of-memory traps.

Here is a video demo showing how to use the hook in Rust:

YouTube demo

The source code is available in the dfinity/examples repository:

Hope you find it useful.

4 Likes

I guess I don’t have to worry the acceptability of bumping this old topic.

Awesome new feature and very helpful!

While we are here, there are a number of ā€˜other’ hooks that would be very helpful and one topic that would be worth discussing here in regards to the work I’ve been doing on ICRC-105. ICRC-105 - Wasm History Management and Transaction Blocks Ā· Issue #105 Ā· dfinity/ICRC Ā· GitHub

The ā€˜spirit’ of ICRC-105 is to give public utility canisters a spot in the ICRC-3 log to record any configuration, upgrade, or installation changes that may be relevant to the execution of the canister. The ideal ā€˜best-case’ scenario would be that given an ICRC-105 + ICRC-3 canister, the set of used wasms, properly recorded ICRC-3 block(proper being that they recorded all information relevant to the deterministic execution program/state changes) that one could reconstruct the state of the canister from the ICRC-3 log. This means we need to record any change to the canister the would affect how the code runs. Since you can write any code there may be a number of unknown unknowns that a dev would be responsible for, but it would be nice to take care of the known possible changes via canister life cycle events.

The obvious events would be

  • Change of Controller(give me the new list)
  • Change of canister settings(like compute_allocation, memory_allocation, freezing_threshold, reserved_cycles_limit, wasm_memory_limit, log_visibility (give me the new setting)
  • A canister freeze(let this run once if my canister is frozen)
  • A change of the environment variables(I understand this is under development) (let me enumerate the items or give them to me)
  • A subnet change(when subnet migration is possible)(new subnet id)
  • Edit: Canister Installed/upgraded (give me the new hash) - right now I have to do an timer+async call from my initializer and it delays and theoretically, something else could get called before I log the new hash…thus I have two blocks, one to record the install/upgraed and then a follow on to record the new hash.

One that is not so obvious would be something that fires if the version of the replica changes. I’m not sure what the process or workflow is there, but I would imagine that if a replica version is updated it might include code that could change the way a canister behaves(something like the cycle cost of certain operations or even system API level information). It may not be a great feature to have every canister on the subnet firing an event each time a new replica version starts up, but perhaps this could fire the first time that canister is called in the future?

1 Like

A general hook into the canister history & settings updates would be really nice :purple_heart:

Maybe this gets integrated into the canister history?

1 Like