Programmatically measure cycles consumption

I want to know the operating costs of a canister that I am developing. Initially, I tried to estimate the costs using this table but besides the obvious ones like canister creation and storage, it turned out to be very difficult since I do not know the number of Wasm instructions of each function. I am now instead investigating if I can actually measure the cycles consumption of the functions. However, I immediately start questioning if it is even possible to do so.

The idea is to use the ExperimentalCycles API to check the balance before and after the execution of a function. However, if the storage cost “ticks” every second (raw assumption based on the fee described in “GB Storage Per Second”), that would likely make it unreliable to check the balance before and after a function. Does anyone know how often the storage and compute allocation costs are deducted from the cycles balance?

I guess another way would be to check the balance in the function that I want to measure, but that would presumably not include every cost related to that function call such as Ingress Message Reception.

Is it possible to programmatically measure cycles consumption function-wise or am I out of luck?

4 Likes

I’ll tag @chenyan. He has written a design doc on canister profiling. Maybe he can chime in on commenting your approach and maybe provide some info about the roadmap to include profiling in dfx.

If the latter is far out, we could think of providing a bounty, because I think that would be useful for a lot of developers.

4 Likes

Thank you. Canister profiling would definitely be helpful and I cannot wait for it.

That being said, I am in need of cost estimation quite soon since I am designing a protocol where costs are crucial for the design in order to make it sustainable. I would appreciate workaround methods to measure and/or calculate close approximations of cycles consumption in the meantime.

1 Like

I have a question regarding the dynamic cycles cost for update call execution. In the cost table it says:

Ten Update Instructions Execution
For every 10 instructions executed when executing update type messages

Do “instructions” refer to the number of Wasm instructions for the virtual machine? Or is it machine instructions after JIT compilation for some real CPU?

I assume it is the former but I would like to have it confirmed.

1 Like

Cycles are simply a weighted sum of Wasm instruction counts. Some system APIs are more expensive than others, e.g. bulk memory access, stable memory. If the Wasm instruction doesn’t access the stored data, there is no charge. So all cost can be computed from the instruction counts, and it’s safe to measure cycle consumption before and after the function call.

If you use Motoko Playground, and enable profiling while deploy, we can draw a flamegraph for each update call, so that you can understand which function costs the most.

The design doc is here: IC-822 Canister profiling - Google Docs

7 Likes

I did not know that Motoko Playground had started to support profiling. This is huge for me, great work!

I have a few questions regarding the interpretation of the profiling data after reading the design document and experimenting with it a bit in the playground:

  1. Immediately when I deploy a canister I get an output as shown below. Does this represent instructions for installing the canister code? If so, is that a cycles cost on top of the Canister Created cost here?

Wasm instructions executed 4246 instrs.

  1. When looking at the flame graph below of the insert() function from the Phone Book example, does the number of instructions (3891) represent the number of “pure” Wasm instructions or the weighted number of instructions where some system API calls are counted as multiple instructions (as discussed here)?
    The reason I’m asking is because I want to know if it is correct to divide the number of instructions by 10 and multiply by 4 to get the cycles cost of the execution, or if there is more to it.

image

  1. Again using the Phone Book example in the playground, I measure three operations where I insert three records one after the other in the same canister. See results below. What causes the same function to vary in instructions count based on different inputs?

insert(“aaa”, record {desc=“aaa”; phone=“111”})
insert: 5970 instrs

insert(“bbb”, record {desc=“bbb”; phone=“222”})
insert: 7345 instrs

insert(“ccc”, record {desc=“ccc”; phone=“333”})
insert: 5807 instrs

  1. When I run the lookup() function, which is a query call, a number of instructions is shown from the profiling. Can these be ignored when calculating cycles costs of a canister, since (if I’m not mistaken) query calls are currently free?

image

  1. Does the profiling support inter-canister calls, as in that you may see the combined number of instructions?
2 Likes

hey @gabe,
Thanks for bringing it up. The document you are referencing will be updated to clarify this point.

Each System API call is a function call from the WebAssembly standpoint. The number of instructions each call takes depends on the work done.

Answering your questions:

  1. Yes, it’s the number of “update instructions”, which will be charged on top of the Canister Created cost, i.e. 4246 / 10 * 4 = 1696 Cycles

  2. As the system API calls are normal functions, just implemented on the IC, the number of instructions is always “pure” if I get your question correctly.

  3. The instructions count varies based on the work done, for example if we need to allocate more Wasm memory, it will take more instructions. Or if the key we’re looking for is deeper in the tree structure, it will take more instructions, etc…

  4. Yes, query calls are free for now.

  5. Yes, the instruction counter normally takes into account any work the Canister performs, including the inter-canister calls.

2 Likes

This is so great. Huge thank you for answering my questions!

1 Like

Besides @berestovskyy’s answer, some clarification to your questions:
1: It’s the cost of Wasm start and canister_init. For Motoko, it’s mostly loading the RTS and initialize the init args if it’s provided.
2: It’s the pure Wasm instruction counts.
4: We cannot profile query calls at the moment. You can ignore the graph output from query calls.
5: We didn’t count the cycle transfer amount here, so inter-canister call is just 1 Wasm instruction. Once the replica exposes the real instruction counter, we can track the number more precisely.

5 Likes

ah, sorry, I thought we’re using the instructions counter already… we need to prioritize the real counter then, otherwise it might be really confusing…

2 Likes

Is ic0.performance_counter implemented already?

1 Like

I don’t think so, yet I think it’s quite easy to implement it as it’s available internally…

1 Like

Could you elaborate on this? Does the entire operation of the function invoked by an inter-canister call appear as one 1 Wasm instruction to the function that invoked it, or is it the call instruction itself that is counted inaccurately? (Sorry if my question is unclear. This is a bit too low-level compared to what I usually work with.)

To be precise, we only count Wasm instructions on the caller side. It takes a few system API calls to make an inter-canister call. We are not tracking cycles consumed by the callee, which is paid by the caller.

1 Like

We are not tracking cycles consumed by the callee, which is paid by the caller.

Wait so are you saying that during intercanister calls, the caller pays for the full execution of the called method on the called canisters behalf?

Or are you saying the caller pays cycles just to “hit play” and transfer whatever data is needed on the called canisters function while the called canister pays to execute that function itself.

3 Likes

Hi chenyan, are you aware if it is currently possible to locally enable canister profiling in Candid UI with the new (0.10.x) dfx release?

hey all,
Two small updates:

  1. The documentation now has a note saying:
    Note: System API calls are just like normal function calls from the WebAssembly stand point. The number of instructions each call takes depends on the work done.
  2. The first iteration of the performance counter specification was just merged, so it should be available on the main net once the new release is rolled out (~1-2 weeks).

Note, it does not mean the flame graph will be using the new perf counter right away. Probably it will take a bit of time to start using it, test it etc. Probably, @chenyan would be a better person to comment on this.

Not yet. We will wait for the replica change to land on dfx. I also need to adjust the Candid UI code to interpret the new format.

2 Likes

Interesting! Is that a link to internal docs? I don’t seem to be able to access it but would love to read up on it.

Got it, thanks for letting me know.