Introducing Performance Counter on The Internet Computer

It used to be mo:prim, it was changed to discourage people from using it directly (it’s an internal module and its content is subject to change).

2 Likes

I’d like to add some feedback on the performance counter API. We’ve been using it extensively to benchmark Azle, Motoko, and Rust canisters. Here’s some feedback:

  1. Motoko seems to outperform beyond reasonable expectations, I’m not exactly sure why (gc/prelude/postlude differences?)
  2. It’s hard to use the performance counter on code that already exists. We either have to change the canister method signatures to return the results of the performance counter, or we have to create some kind of global variable and set the performance results during update call execution (which doesn’t work for query calls). It would be much nicer to just call a query/update function and have ALL of the performance information returned as part of the response data.
4 Likes

Motoko seems to outperform beyond reasonable expectations

Unexpected, but very cool! :sunglasses:

It would be much nicer to just call a query/update function and have ALL of the performance information returned as part of the response data.

Sounds like Prometheus library, no? We can collect performance counters in different parts of the program and map them to Prometheus counters. Then a query might scrape the results in one go.

Not sure if there are any Prometheus libraries for Motoko, but even basic approach described in Effective Rust Canisters > Observability might do the job.

2 Likes

Haha, I meant that I think there’s a problem with the performance counter, I suspect the Motoko function prelude instructions aren’t being accounted for in the same way that Rust’s are.

1 Like

I think it would be best if the developer didn’t have to change anything in their source code, so that any query or update call can be profiled by a simple call. It really is a lot of work to add all of this performance counting code, especially retrofitting it into existing applications.

1 Like

you mean like call("my_query") would also return some meta information how many instructions (or Cycles) the whole query took?

3 Likes

Exactly, I believe this is how Ethereum works, it will return the exact amount of gas used.

1 Like

It’s not on our roadmap, but we’ll discuss it internally this week and I’ll let you know @lastmjs

4 Likes

I appreciate that, thank you

1 Like

We’ve just discussed it internally, @lastmjs

The team has acknowledged the current performance counter implementation might be not the easiest one to use for quick performance assessments of the existing Canisters, some effort must be put into instrumenting the code, making this information available, etc.

If we implement this on the IC level, there are a few open questions like should this information be public, how to pass the information to the users and caller Canisters, how to represent the results etc. Seems like we’re looking at the Distributed Tracing here, and we’re not quite sure how much of that we want to be implemented on the IC. So, it might take some time to clarify those questions…

From the other side, we should definitely improve the profiling on the SDK/CDK level. It might be as easy as:

#[query]
#[enable_profiler]
fn my_query() {

}

This #[enable_profiler] macro could generate my_query and my_query_profiler. Calling my_query_profiler could internally execute my_query, gather and return the profiler info…

Another point we discussed is a process of getting a structured feedback from the community. While this forum is a great tool for casual conversations or announcements, it’s still not that easy for you guys to file a formal feature request, show that it’s a valuable feature for the community, and so for this request to be prioritized accordingly.

I hope we will soon improve here as well…

5 Likes

That all sounds great, I look forward to the discussions and improvements.

2 Likes

Love the ideas and the attitude @berestovskyy ! Thank you !

2 Likes

I’m just going through this thread trying to understand how this Performance Counter really work and then I just thought I should ask this thing I have been wondering all along.

Why does this Prim module have to be imported from this emoji :no_entry: ?. What’s happening there?

2 Likes

IMO it’s just a name, and I guess it means that you should know what you’re doing using those functions, but probably @claudio or @ggreif could explain better.

1 Like

It is an internal module, specific to the compiler version. The emoji tries to convey the message that it is off-limits, so don’t expect stuff in there to be in any way stable. The base library is the stable user-facing interface.

That said, accessing the primitives can be justified with various arguments, but you should know what you are doing :slight_smile: (and be prepared for breakage). Test well!

1 Like

Hey folks,
There is a new async-friendly performance counter available (type 1):

Rust:       ic_cdk::api::performance_counter(1);
Motoko:     import IC "mo:base/ExperimentalInternetComputer";
            IC.performanceCounter(1);
TypeScript: import ic from 'azle';
            ic.performanceCounter(1);
Python:     from kybra import ic
            ic.performance_counter(1)

The new counter lives in call context and monotonically increases across await points, until the original call is either replied or rejected:

For more details please see the spec and the examples.

2 Likes

Will it count Motoko GC if I make an async call to the canister itself?

Motoko compiled with --force-gc flag

The perf counter does not include any nested executions, even if it’s a call to self. So IMO it won’t, but we can double check with Motoko experts cc @claudio @chenyan @luc-blaeser @ggreif

The way the Motoko GC works is that it runs after the user code in any IC message, including those user-code calls to performanceCounter(). So I think performance counter(1) will include the cost of all GC calls executed in the call context prior sofar, but not the last GC itself before returning or doing the next await().

There is a low-level primitive “rts_mutator_instructions” and “rts_collector_instructions” (IIRC) that give you the costs of the last message executed, but it uses performanceCounter(0) to do that, so you won’t get a running tally of all messages that make up the current call context, just the last one.

Also, the incremental GC spreads a full GC across several messages, so rts_collector_instructions just counts the amount of GC work actually done in the message, not the cost of a full GC of the entire heap.

3 Likes

Is this the right way to count instructions used by GC?

public func gcIns() : async Nat {
  doSomeWork(); // add/remove heap data
  await noop(); // trigger gc
  Prim.rts_mutator_instructions() + Prim.rts_collector_instructions(); // gc cost for doSomeWork()
};