I will submit an NNS motion proposal after confirming with the DFINITY stakeholders. We have started working on the implementation. The current estimate is about 2 weeks. After that we should be able to double the limit.
Another update: the implementation of DTS for heartbeats and timers is in review and will be merged soon.
I have some calls that run for 40s before they fail with exceeding cycles per call. Either the execution is very slow or the limit is higher on the local replica.
Would be nice if these limits were configurable or in lock step on the local replica side.
Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.
Depends on the version you’re using, and the subnet type. The latest dfx betas have DTS enabled (anything starting from 0.13.0-beta.0 AFAIR).
Any chance you’re running your local replica in system subnet mode? dfx start -vv starts with debug messages enabled - It should show a detailed configuration of your settings
20B instructions should take about 10s on average. If you have many active canisters, then execution may take longer. Execution may also take longer if the call uses more heavy instructions such as accessing memory.
Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.
Unfortunately, the limits are baked into the replica binary. Making them configurable externally is possible, but is a chunk of work. I wonder if debug printing the ic0.performance_counter() solves the problem you mentioned?
I have dfx 0.12.2-beta.0. Judging by the time it can take (40s) DTS must be on. I’ll try 0.13.0-beta too.
It says “subnet type: Application”.
I have only 1 canister. But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.
But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.
@luc-blaeser is working on incremental GC to improve the latency in such cases.
Is there a way for us to reproduce your test to look deeper into it?
import Prim "mo:⛔";
import Array "mo:base/Array";
actor {
stable let d1 = Array.init<?[var Nat32]>(2**15,null);
stable var end = 0;
public query func size() : async Nat = async end;
public query func name() : async Text = async "v0";
type Mem = {
version: Text;
memory_size: Nat;
heap_size: Nat;
total_allocation: Nat;
reclaimed: Nat;
max_live_size: Nat;
};
public query func mem() : async Mem {
{
version = Prim.rts_version();
memory_size = Prim.rts_memory_size();
heap_size = Prim.rts_heap_size();
total_allocation = Prim.rts_total_allocation();
reclaimed = Prim.rts_reclaimed();
max_live_size = Prim.rts_max_live_size();
}
};
public func benchN(n : Nat) : () {
var i = 0;
while (i < n) {
d1[end] := ?Array.init<Nat32>(2**15,0);
end += 1;
i += 1;
};
};
};
If you run this with the copying GC (default) and a large enough number that it will hit the cycle limit then you can observe the time. For me,
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(10000)'
runs for >40s before it fails with “Canister rrkah-fqaaa-aaaaa-aaaaq-cai exceeded the instruction limit for single message execution.”
If you run this sequence
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
then you can see the increasing time the garbage collector takes. A call that doesn’t trigger GC takes 3s. The ones that trigger GC take 10s (2nd call), 20s (4th call) and 36s (7th call) for me. At this point there are ~230m objects in the heap. The GC cannot handle much more within the cycle limit.
The compacting GC seems to burn through the cycles much faster. If I run
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
6 times then the last call fails with out of cycles after 12s. I guess this means that growing the memory is what costs significant time (the copying GC doubles it) and accessing the memory is what burns cycles (and both GCs produce the same accesses).
What errors are you referring to? I was suggesting to use -vv to check the replica settings, which works with with 0.13.0-beta.0, at least on my machine
In that case you’re likely also exceeding the limit with DTS enabled. If you access performance_counter and debug print the values, you should get up to ~20B, which is the instruction limit with DTS enabled. If your function runs for more than that, you’ll hit the instruction limit and have to either wait until the limit gets increased or rewrite your function
DTS was already activated sometime before 0.13, at least in 0.12.2-beta.0, possibly earlier. So I wouldn’t expect any differences when upgrading to 0.13.0-beta.0.
Are you running a query or an update call? Queries currently have a lower instruction limit of 5B. In order to increase the limit, we first need to introduce some way of charging for queries.
Hello. Is there any documentation that explains how to work with DTS? I tried looking on internetcomputer.org but I didn’t see any results for “DTS” or “Time Slicing”. Thanks.