Deterministic Time Slicing

How’s it going? Any chance of 2x again anytime soon?

The community seems to be in favor of the proposal: Proposal: Configurable Wasm Heap Limit

I will submit an NNS motion proposal after confirming with the DFINITY stakeholders. We have started working on the implementation. The current estimate is about 2 weeks. After that we should be able to double the limit.

Another update: the implementation of DTS for heartbeats and timers is in review and will be merged soon.

6 Likes

Is it 20B in local deployment, too?

I have some calls that run for 40s before they fail with exceeding cycles per call. Either the execution is very slow or the limit is higher on the local replica.

1 Like

Would be nice if these limits were configurable or in lock step on the local replica side.

Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.

Depends on the version you’re using, and the subnet type. The latest dfx betas have DTS enabled (anything starting from 0.13.0-beta.0 AFAIR).

Any chance you’re running your local replica in system subnet mode? dfx start -vv starts with debug messages enabled - It should show a detailed configuration of your settings

2 Likes

20B instructions should take about 10s on average. If you have many active canisters, then execution may take longer. Execution may also take longer if the call uses more heavy instructions such as accessing memory.

Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.

Unfortunately, the limits are baked into the replica binary. Making them configurable externally is possible, but is a chunk of work. I wonder if debug printing the ic0.performance_counter() solves the problem you mentioned?

I have dfx 0.12.2-beta.0. Judging by the time it can take (40s) DTS must be on. I’ll try 0.13.0-beta too.

It says “subnet type: Application”.

I have only 1 canister. But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.

1 Like

But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.

@luc-blaeser is working on incremental GC to improve the latency in such cases.

Is there a way for us to reproduce your test to look deeper into it?

The code is

import Prim "mo:⛔";
import Array "mo:base/Array";

actor {
    stable let d1 = Array.init<?[var Nat32]>(2**15,null);
    stable var end = 0;

    public query func size() : async Nat = async end;
    public query func name() : async Text = async "v0";

    type Mem = {
        version: Text;
        memory_size: Nat;
        heap_size: Nat;
        total_allocation: Nat;
        reclaimed: Nat;
        max_live_size: Nat;
    };
    public query func mem() : async Mem {
      {
        version = Prim.rts_version();
        memory_size = Prim.rts_memory_size();
        heap_size = Prim.rts_heap_size();
        total_allocation = Prim.rts_total_allocation();
        reclaimed = Prim.rts_reclaimed();
        max_live_size = Prim.rts_max_live_size(); 
      } 
    };

    public func benchN(n : Nat) : () {
        var i = 0;
        while (i < n) {
            d1[end] := ?Array.init<Nat32>(2**15,0);
            end += 1;
            i += 1;
        };
    };
};

If you run this with the copying GC (default) and a large enough number that it will hit the cycle limit then you can observe the time. For me,

time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(10000)'

runs for >40s before it fails with “Canister rrkah-fqaaa-aaaaa-aaaaq-cai exceeded the instruction limit for single message execution.”

If you run this sequence

time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'

then you can see the increasing time the garbage collector takes. A call that doesn’t trigger GC takes 3s. The ones that trigger GC take 10s (2nd call), 20s (4th call) and 36s (7th call) for me. At this point there are ~230m objects in the heap. The GC cannot handle much more within the cycle limit.

1 Like

The compacting GC seems to burn through the cycles much faster. If I run

time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'

6 times then the last call fails with out of cycles after 12s. I guess this means that growing the memory is what costs significant time (the copying GC doubles it) and accessing the memory is what burns cycles (and both GCs produce the same accesses).

Does this work on local? I am using dfx 0.13.0-beta.0 and still see errors.

1 Like

What errors are you referring to? I was suggesting to use -vv to check the replica settings, which works with with 0.13.0-beta.0, at least on my machine

I see this exceeded the instruction limit for single message execution. So even after upgrading DFX to that version, I am not sure that it is working.

In that case you’re likely also exceeding the limit with DTS enabled. If you access performance_counter and debug print the values, you should get up to ~20B, which is the instruction limit with DTS enabled. If your function runs for more than that, you’ll hit the instruction limit and have to either wait until the limit gets increased or rewrite your function

1 Like

DTS was already activated sometime before 0.13, at least in 0.12.2-beta.0, possibly earlier. So I wouldn’t expect any differences when upgrading to 0.13.0-beta.0.

1 Like

Are you running a query or an update call? Queries currently have a lower instruction limit of 5B. In order to increase the limit, we first need to introduce some way of charging for queries.

1 Like

Seems like this function will need to wait a bit! It is a PLONK, but we have a few other schemes that work a bit better :wink:

Hello. Is there any documentation that explains how to work with DTS? I tried looking on internetcomputer.org but I didn’t see any results for “DTS” or “Time Slicing”. Thanks.

DTS kicks in automatically for message types that support it and allows for larger instruction limit. No action is required from the developer. Here is the table with the current instruction limits: https://internetcomputer.org/docs/current/developer-docs/production/instruction-limits

3 Likes

It would be nice to have a section on DTS and how it works though at some point. Here would be probably a good place: Internet Computer Content Validation Bootstrap ^^

4 Likes

@domwoe: Thanks! I agree and will add DTS explanation there. So far we have: Internet Computer Content Validation Bootstrap, but it is very high-level.

3 Likes