Deterministic Time Slicing

Good catch Ulan! Yes, this is why @skilesare is seeing the problem he reported above. We should really try to get the local replica to emulate the mainnet as closely as possible.

6 Likes

That feature would alleviate a massive pain in the developer experience, so Iā€™m super enthusiast to see it implemented.

Is this still in the pipes ? Any news or update on this ?

2 Likes

Thanks, it is great to hear that the feature will be useful! We are implementing it and hoping to get the initial prototype around end of May/June.

5 Likes

Just wanted to check back in regarding the progress on the Deterministic Time Slicing feature.

Iā€™ve been load testing insertion into several data structures recently (Buffer, HashMap, RBTree), all of which seem to start hitting the message instruction limit on canisters much earlier than their heap capacity, regardless of their insertion runtimes ( i.e. O(1) vs. O(log(n) ).

Iā€™d be interested in further testing out how these data structures perform as they scale within canisters, and DTS would be key in doing so.

Just curious @ulan or @dieter.sommer - is some variant of Deterministic Time Slicing enabled for the bitcoin integration project, or are all update types of transactions/operations able to be completed over a single round of consensus?

5 Likes

The implementation of DTS is close to completion. We hope to merge the main remaining changes in 1-2 weeks. After that we need to do stress testing to ensure that we didnā€™t not miss anything. Once thatā€™s done Iā€™ll update this thread and give a more concrete timeline for shipping.

(Buffer, HashMap, RBTree) all of which seem to start hitting the message instruction limit on canisters much earlier than their heap capacity, regardless of their insertion runtimes

That is expected for HashMap because it occasionally copies the entire backing store in O(n) time, but the result for RBTree is surprising because it should have the worst case runtime of O(m * log(n)) where m is the number of operations and n is the number of elements. As long as m is not large for each message (e.g. doesnā€™t exceed 10^6), I would expect it to not hit the instruction limit.

is some variant of Deterministic Time Slicing enabled for the bitcoin integration project, or are all update types of transactions/operations able to be completed over a single round of consensus?

AFAIK @ielashi has carefully designed the operations to not exceed a single round.

8 Likes

At the moment, the Bitcoin API is implemented in the replica, which means weā€™re not subject to the same cycles limitation as Wasm canisters. Weā€™ve added some measures so we donā€™t spend too much time on execution within the round, however these measures are rather crude.

Weā€™re now in the process of migrating the Bitcoin API into a wasm canister, and there weā€™ll be subject to the same cycles limitation as all other canisters. Some computations like block ingestion do require a lot of cycles, even more than what DTS will initially support, and for that weā€™ll be implementing a simple slicing logic within the canister to stay within the cycles limit.

8 Likes

Great to hear, thanks for the update!

How many rounds of consensus and cycles do you see some of these operations taking?

Also, would love to hear a bit more about the strategies/techniques behind what this simple slicing logic looks like - is it implemented on the replica level or directly in the API (in a way that is currently accessible to IC developers)?

2 Likes

I have seen some large blocks take up to 385B instructions to be ingested. With a 5B instruction limit, that translates to ~77 execution rounds. A large block can have tens of thousands of inputs/outputs, which translates to tens of thousands of insert/remove operations on our StableBTreeMap data structures. These numbers are using the code that we have in the replica as-is. I expect there would be some low-hanging optimizations that would help reduce the required number of instructions.

Also, would love to hear a bit more about the strategies/techniques behind what this simple slicing logic looks like - is it implemented on the replica level or directly in the API (in a way that is currently accessible to IC developers)?

Weā€™ll be adding the slicing logic in the canister itself. It hasnā€™t been written yet, but itā€™ll be highly specific to the case of block ingestion, so itā€™s unfortunately not something that can be packaged up and used elsewhere. Iā€™ll share pointers to the slicing code as soon as its available.

On the API level, you may have noticed we added pagination to our get_utxos endpoint, which was one way we were able to stay within the instructions limit for these requests.

5 Likes

Quick update: we are aiming to enable deterministic time slicing on master in the week of September 19th and to roll it out to the mainnet in the subsequent week.

24 Likes

Do you mind sharing the GitHub PR and/or the ā€œBless Replica Versionā€ NNS proposal when they become available? This is quite an important feature. Thanks!

4 Likes

Do you mind sharing the GitHub PR and/or the ā€œBless Replica Versionā€ NNS proposal when they become available?

Absolutely! Iā€™ll share the PR and the proposal. We are currently fixing two blockers that may delay the launch a bit.

8 Likes

We are currently fixing two blockers that may delay the launch a bit.

One of these blockers turned out to be more tricky than we thought. We are still working on it.

3 Likes

Are any of the blockers something that the community can give input on/or help with in an advisory capacity?

3 Likes

Thanks for the offer! It is more of an implementation issue about how to ensure correctness of DTS state in certain cases. We have an idea on how to solve and are currently implementing it.

4 Likes

Never give up, never surrender! Youā€™re doing such excellent work

11 Likes

We fixed the blocker and will be incrementally rolling out DTS in the following weeks.

The version that enables DTS for install_code messages will be deployed on k44fs [proposal to elect a replica binary, proposal to update k44fs].

14 Likes

Awesome news!

@ulan Is it possible to set limits on DTS (i.e. the rounds of consensus it takes to process a message) for specific APIs as a cycle-drain/DDOS prevention mechanism? For example, Iā€™d like to say that this API can span a maximum of 5 rounds of consensus before failing/trapping (instead of using up all 50 rounds worth of cycles).

@Severin any ETA on when DTS might be included in dfx (for local testing purposes)? Also, will we be able to profile how many rounds of consensus it takes to process a specific update call?

5 Likes

We plan a new 0.12.0 beta release next week. Iā€™ll try to get it in there. EDIT: Managed to update the replica without any problems (PR). The DTS-enabled replica will be included in the next beta release!

No special tooling that Iā€™m aware of. @ulan any chance you know about something?

2 Likes

@icme: We are rolling out DTS slowly to make sure we donā€™t introduce regressions.

The plan is:

  • Roll out DTS for install_code messages keeping the instruction limit the same but slicing it into multiple rounds (because install_code messages already have large limits).
  • Roll out DTS for update calls and responses increasing the instruction limit by 2x.

If that goes well, we will increase the limits more. The final values of the limits are to be defined.

Note that we are not planning to support DTS for queries and heartbeats because they are expected to be short-running. If heartbeat needs to do a long-running computation, then it can make a self-call, which will be an update message and will run with DTS.

Is it possible to set limits on DTS (i.e. the rounds of consensus it takes to process a message) for specific APIs as a cycle-drain/DDOS prevention mechanism?

Right now it is not possible, but we are discussing a canister settings parameter that would allow the controller to set a lower limit that would apply to all methods of the canister. Would that work in your use case?

Supporting individual limits per Wasm methods seems difficult due to the performance and memory overhead because of the need to parse and store this information for each canister.

No special tooling that Iā€™m aware of. @ulan any chance you know about something?

The ic0.performance_counter() would be a way to infer that. Dividing that number by the slice size in instructions would give the number of rounds. The slice size is defined in the replica and is currently set to 2B (it may change though if we find that other values work better).

7 Likes

For the most part, a global canister setting to set a lower limit should be fine:

One use cases that I can thing of off the top of my head as to why Iā€™d want something like this:

  1. To run a daily sync or job that performs some sort of MapReduce across much of the data in the canister. Iā€™d like for this job to be able to span multiple rounds of consensus, but maybe I donā€™t want just any update call to be able to span multiple rounds of consensus.

  2. Based on the size of the data structure and/or any GC required (i.e. in Motoko), some calls that access or modify large data structures (i.e. insert/delete in a BTree) will periodically need to span multiple rounds of consensus if the operation is inefficient (i.e. scan + filter) or the GC is invoked. For these APIs, Iā€™d like to give them permission to span multiple rounds.

    However, for a simple API that sets a more primitive parameter or performs a simple computation thereā€™s no reason for this API to span multiple rounds. Maybe this suggests Iā€™m failing to check size limits of my input parameters, or that I have an expensive operation in my code :person_shrugging:

Iā€™m not convinced that the per API limits canā€™t just be solved by the developer testing and putting in these checks themselves, but the lower overall canister limit would definitely be useful.

3 Likes