Deterministic Time Slicing

it is a bit odd if we don’t immediately allow long queries, because then some messages would work in replicated mode but not in query mode right

No, it is consistent. A query method both in replicated and non-replicated mode has the same non-DTS instruction limit of 5B instructions. In other words, running query as update, doesn’t activate DTS for it.

Update methods run with DTS.

1 Like

Maybe I misunderstood you. I guess you meant that if we take the same function foo() and put it inside a query method, then it will run out of instructions. But it we put it inside an update method, then it might succeed? If so, then yes. There is inconsistency.

Ah makes sense, no your first comment answered my question.

1 Like

Hey :wave: , just checking in on this feature - are we currently at 2X increase on DTS?

What does the timeline look like for getting to 6-10X on this feature?

(asking on behalf of a few eager Motoko devs) :sweat_smile:

4 Likes

Hi @icme!

Yes, we are currently at 2x limit. DTS looks good so far in production, so I think we could go to 6x relatively quickly: in a couple of replica versions.

There is one non-technical issue that we discovered with Motoko that needs to be resolved before we go to 6x.

The issue is out-of-memory handling in Motoko. Currently the low instruction limit for updates acts as a safeguard against Motoko canisters hitting the hard 4GB limit. When the memory usage of a Motoko canister increases and reaches 1-2GB, then update messages start failing with out-of-instructions errors. At that point upgrading the canister is still possible (because upgrade messages have higher instruction limit), so the owner of the canister can salvage the canister and its data by upgrading it to a new version that uses less memory.

With the 6x DTS, the canister will be able to grow to 4GB with update messages. Once the canister reaches 4GB and updates start failing due to out-of-memory, then upgrades will also fail. This means that the canister becomes stuck without any fix.

I have an idea to solve this problem by introducing a “freezing threshold” for memory. It would be a canister settings parameter with the default value of 3GB. When the canister reaches that limit, then updates start failing, but upgrades continue to work. The owner of the canister would be able to increase or decrease the parameter.

3 Likes

This is awesome @ulan :tada: thanks for the update - can’t wait to test it out!

Just to be clear, this memory limit is because of overflowing the heap/main memory, and is different than the upgrades failing due to upgrade cycles limitations, correct? I believe streaming serialization was implemented (see @claudio’s comment in link) that allows very close to (slightly less than) 4GB of heap memory to be serialized to stable memory during upgrades.

Also, with respect to the “freezing threshold” idea

I think it would be great if this would be a system func “hook” that a canister developer could tie into and trigger an action once this threshold is hit.

With CanDB I’m doing something similar (but not implemented at the language/system level obviously). I currently have two fixed limits that are lower than the 4GB heap limits. These limits are:

  1. An INSERT_THRESHOLD, after which no new items can be inserted
  2. An UPDATE_THRESHOLD, after which no new items can be modified (i.e prevents a user from appending to the attribute metadata associated with a record)

I use these thresholds to both trigger auto-scaling actions as well as to permit or reject additional inserts/updates to the CanDB data store in a canister.

3 Likes

Based on some initial tests, I found that 2X DTS was able to push heap memory to roughly 2X its previous limits before hitting the GC. Big improvement - and looking forwards to 6X DTS.

This was pre-DTS
Screen Shot 2022-11-22 at 10.19.52

This is at 2X DTS

Reference: for these tests, I’m just inserting into a RBTree<Nat, Nat>.


@ulan I think with large blob data and 2X DTS we still might be able to push canisters to grow to 4GB anyways (for example, inserting 1.5-1.9MB chunks), so this 4GB update issue will still exist.

3 Likes

Yes, exactly. It is about the memory limit. Streaming serialization allocates a small buffer, which may also fail if the update calls use all the available 4GB memory.

I think it would be great if this would be a system func “hook” that a canister developer could tie into and trigger an action once this threshold is hit.

That would be some kind of memory pressure callback? I.e. a user defined function is called by the system when the canister’s Wasm memory reaches a user-defined threshold. I like the idea. Perhaps, it could be generalized to canister lifecycle callbacks/notification: low cycles notification, low memory notification, execution failure notification, etc.

With CanDB I’m doing something similar (but not implemented at the language/system level obviously). I currently have two fixed limits that are lower than the 4GB heap limits.

You’re way ahead of many developers that don’t think about potential out-of-memory.

Based on some initial tests, I found that 2X DTS was able to push heap memory to roughly 2X its previous limits before hitting the GC. Big improvement - and looking forwards to 6X DTS.

Thanks for running the test! It’s great to see DTS helping.

I think with large blob data and 2X DTS we still might be able to push canisters to grow to 4GB anyways (for example, inserting 1.5-1.9MB chunks), so this 4GB update issue will still exist.

I agree.

2 Likes

Small update: we increased the limit to 20B instructions. It will be rolled out to all subnets next week. Further increases are blocked by the memory freezing threshold feature.

9 Likes

@icme: I learned today that Motoko can sometimes perform GC in a heartbeat. I thought previously that Motoko always calls user function as a separate update message that does GC if needed, but my understanding was incorrect. If your canister has a heartbeat, then it might fail with out-of-instructions.

I’ll try implement DTS for heartbeat as soon as possible.

3 Likes

In the meantime, the Motoko team will try to prepare a release that avoids the GC during heartbeat that users can elect to use by installing from the GitHub release page (without waiting for a release of dfx).

UPDATE: a release, for manual installation, is here:

3 Likes

How’s it going? Any chance of 2x again anytime soon?

3 Likes

How’s it going? Any chance of 2x again anytime soon?

The community seems to be in favor of the proposal: Proposal: Configurable Wasm Heap Limit

I will submit an NNS motion proposal after confirming with the DFINITY stakeholders. We have started working on the implementation. The current estimate is about 2 weeks. After that we should be able to double the limit.

Another update: the implementation of DTS for heartbeats and timers is in review and will be merged soon.

6 Likes

Is it 20B in local deployment, too?

I have some calls that run for 40s before they fail with exceeding cycles per call. Either the execution is very slow or the limit is higher on the local replica.

1 Like

Would be nice if these limits were configurable or in lock step on the local replica side.

Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.

Depends on the version you’re using, and the subnet type. The latest dfx betas have DTS enabled (anything starting from 0.13.0-beta.0 AFAIR).

Any chance you’re running your local replica in system subnet mode? dfx start -vv starts with debug messages enabled - It should show a detailed configuration of your settings

2 Likes

20B instructions should take about 10s on average. If you have many active canisters, then execution may take longer. Execution may also take longer if the call uses more heavy instructions such as accessing memory.

Local configuration would allow developers to stay in lock step with the mainnet config, and modifying the limit could help devs identify the performance of inefficient long running update calls.

Unfortunately, the limits are baked into the replica binary. Making them configurable externally is possible, but is a chunk of work. I wonder if debug printing the ic0.performance_counter() solves the problem you mentioned?

I have dfx 0.12.2-beta.0. Judging by the time it can take (40s) DTS must be on. I’ll try 0.13.0-beta too.

It says “subnet type: Application”.

I have only 1 canister. But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.

1 Like

But what is running is the Motoko GC so I guess that qualifies as “heavy instructions”. I have seen 40s. Based on that I am worried about user experience. If that happened in production it would shut out all users for 40s.

@luc-blaeser is working on incremental GC to improve the latency in such cases.

Is there a way for us to reproduce your test to look deeper into it?

The code is

import Prim "mo:⛔";
import Array "mo:base/Array";

actor {
    stable let d1 = Array.init<?[var Nat32]>(2**15,null);
    stable var end = 0;

    public query func size() : async Nat = async end;
    public query func name() : async Text = async "v0";

    type Mem = {
        version: Text;
        memory_size: Nat;
        heap_size: Nat;
        total_allocation: Nat;
        reclaimed: Nat;
        max_live_size: Nat;
    };
    public query func mem() : async Mem {
      {
        version = Prim.rts_version();
        memory_size = Prim.rts_memory_size();
        heap_size = Prim.rts_heap_size();
        total_allocation = Prim.rts_total_allocation();
        reclaimed = Prim.rts_reclaimed();
        max_live_size = Prim.rts_max_live_size(); 
      } 
    };

    public func benchN(n : Nat) : () {
        var i = 0;
        while (i < n) {
            d1[end] := ?Array.init<Nat32>(2**15,0);
            end += 1;
            i += 1;
        };
    };
};

If you run this with the copying GC (default) and a large enough number that it will hit the cycle limit then you can observe the time. For me,

time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(10000)'

runs for >40s before it fails with “Canister rrkah-fqaaa-aaaaa-aaaaq-cai exceeded the instruction limit for single message execution.”

If you run this sequence

time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'
time dfx canister call rrkah-fqaaa-aaaaa-aaaaq-cai benchN '(1000)'

then you can see the increasing time the garbage collector takes. A call that doesn’t trigger GC takes 3s. The ones that trigger GC take 10s (2nd call), 20s (4th call) and 36s (7th call) for me. At this point there are ~230m objects in the heap. The GC cannot handle much more within the cycle limit.

1 Like