Good catch Ulan! Yes, this is why @skilesare is seeing the problem he reported above. We should really try to get the local replica to emulate the mainnet as closely as possible.
That feature would alleviate a massive pain in the developer experience, so Iām super enthusiast to see it implemented.
Is this still in the pipes ? Any news or update on this ?
Thanks, it is great to hear that the feature will be useful! We are implementing it and hoping to get the initial prototype around end of May/June.
Just wanted to check back in regarding the progress on the Deterministic Time Slicing feature.
Iāve been load testing insertion into several data structures recently (Buffer, HashMap, RBTree), all of which seem to start hitting the message instruction limit on canisters much earlier than their heap capacity, regardless of their insertion runtimes ( i.e. O(1) vs. O(log(n) ).
Iād be interested in further testing out how these data structures perform as they scale within canisters, and DTS would be key in doing so.
Just curious @ulan or @dieter.sommer - is some variant of Deterministic Time Slicing enabled for the bitcoin integration project, or are all update types of transactions/operations able to be completed over a single round of consensus?
The implementation of DTS is close to completion. We hope to merge the main remaining changes in 1-2 weeks. After that we need to do stress testing to ensure that we didnāt not miss anything. Once thatās done Iāll update this thread and give a more concrete timeline for shipping.
(Buffer, HashMap, RBTree) all of which seem to start hitting the message instruction limit on canisters much earlier than their heap capacity, regardless of their insertion runtimes
That is expected for HashMap
because it occasionally copies the entire backing store in O(n)
time, but the result for RBTree
is surprising because it should have the worst case runtime of O(m * log(n))
where m
is the number of operations and n
is the number of elements. As long as m
is not large for each message (e.g. doesnāt exceed 10^6), I would expect it to not hit the instruction limit.
is some variant of Deterministic Time Slicing enabled for the bitcoin integration project, or are all update types of transactions/operations able to be completed over a single round of consensus?
AFAIK @ielashi has carefully designed the operations to not exceed a single round.
At the moment, the Bitcoin API is implemented in the replica, which means weāre not subject to the same cycles limitation as Wasm canisters. Weāve added some measures so we donāt spend too much time on execution within the round, however these measures are rather crude.
Weāre now in the process of migrating the Bitcoin API into a wasm canister, and there weāll be subject to the same cycles limitation as all other canisters. Some computations like block ingestion do require a lot of cycles, even more than what DTS will initially support, and for that weāll be implementing a simple slicing logic within the canister to stay within the cycles limit.
Great to hear, thanks for the update!
How many rounds of consensus and cycles do you see some of these operations taking?
Also, would love to hear a bit more about the strategies/techniques behind what this simple slicing logic looks like - is it implemented on the replica level or directly in the API (in a way that is currently accessible to IC developers)?
I have seen some large blocks take up to 385B instructions to be ingested. With a 5B instruction limit, that translates to ~77 execution rounds. A large block can have tens of thousands of inputs/outputs, which translates to tens of thousands of insert/remove operations on our StableBTreeMap
data structures. These numbers are using the code that we have in the replica as-is. I expect there would be some low-hanging optimizations that would help reduce the required number of instructions.
Also, would love to hear a bit more about the strategies/techniques behind what this simple slicing logic looks like - is it implemented on the replica level or directly in the API (in a way that is currently accessible to IC developers)?
Weāll be adding the slicing logic in the canister itself. It hasnāt been written yet, but itāll be highly specific to the case of block ingestion, so itās unfortunately not something that can be packaged up and used elsewhere. Iāll share pointers to the slicing code as soon as its available.
On the API level, you may have noticed we added pagination to our get_utxos
endpoint, which was one way we were able to stay within the instructions limit for these requests.
Quick update: we are aiming to enable deterministic time slicing on master in the week of September 19th and to roll it out to the mainnet in the subsequent week.
Do you mind sharing the GitHub PR and/or the āBless Replica Versionā NNS proposal when they become available? This is quite an important feature. Thanks!
Do you mind sharing the GitHub PR and/or the āBless Replica Versionā NNS proposal when they become available?
Absolutely! Iāll share the PR and the proposal. We are currently fixing two blockers that may delay the launch a bit.
We are currently fixing two blockers that may delay the launch a bit.
One of these blockers turned out to be more tricky than we thought. We are still working on it.
Are any of the blockers something that the community can give input on/or help with in an advisory capacity?
Thanks for the offer! It is more of an implementation issue about how to ensure correctness of DTS state in certain cases. We have an idea on how to solve and are currently implementing it.
Never give up, never surrender! Youāre doing such excellent work
We fixed the blocker and will be incrementally rolling out DTS in the following weeks.
The version that enables DTS for install_code
messages will be deployed on k44fs
[proposal to elect a replica binary, proposal to update k44fs].
Awesome news!
@ulan Is it possible to set limits on DTS (i.e. the rounds of consensus it takes to process a message) for specific APIs as a cycle-drain/DDOS prevention mechanism? For example, Iād like to say that this API can span a maximum of 5 rounds of consensus before failing/trapping (instead of using up all 50 rounds worth of cycles).
@Severin any ETA on when DTS might be included in dfx (for local testing purposes)? Also, will we be able to profile how many rounds of consensus it takes to process a specific update call?
We plan a new 0.12.0 beta release next week. Iāll try to get it in there. EDIT: Managed to update the replica without any problems (PR). The DTS-enabled replica will be included in the next beta release!
No special tooling that Iām aware of. @ulan any chance you know about something?
@icme: We are rolling out DTS slowly to make sure we donāt introduce regressions.
The plan is:
- Roll out DTS for
install_code
messages keeping the instruction limit the same but slicing it into multiple rounds (becauseinstall_code
messages already have large limits). - Roll out DTS for update calls and responses increasing the instruction limit by 2x.
If that goes well, we will increase the limits more. The final values of the limits are to be defined.
Note that we are not planning to support DTS for queries and heartbeats because they are expected to be short-running. If heartbeat needs to do a long-running computation, then it can make a self-call, which will be an update message and will run with DTS.
Is it possible to set limits on DTS (i.e. the rounds of consensus it takes to process a message) for specific APIs as a cycle-drain/DDOS prevention mechanism?
Right now it is not possible, but we are discussing a canister settings parameter that would allow the controller to set a lower limit that would apply to all methods of the canister. Would that work in your use case?
Supporting individual limits per Wasm methods seems difficult due to the performance and memory overhead because of the need to parse and store this information for each canister.
No special tooling that Iām aware of. @ulan any chance you know about something?
The ic0.performance_counter()
would be a way to infer that. Dividing that number by the slice size in instructions would give the number of rounds. The slice size is defined in the replica and is currently set to 2B (it may change though if we find that other values work better).
For the most part, a global canister setting to set a lower limit should be fine:
One use cases that I can thing of off the top of my head as to why Iād want something like this:
-
To run a daily sync or job that performs some sort of MapReduce across much of the data in the canister. Iād like for this job to be able to span multiple rounds of consensus, but maybe I donāt want just any update call to be able to span multiple rounds of consensus.
-
Based on the size of the data structure and/or any GC required (i.e. in Motoko), some calls that access or modify large data structures (i.e. insert/delete in a BTree) will periodically need to span multiple rounds of consensus if the operation is inefficient (i.e. scan + filter) or the GC is invoked. For these APIs, Iād like to give them permission to span multiple rounds.
However, for a simple API that sets a more primitive parameter or performs a simple computation thereās no reason for this API to span multiple rounds. Maybe this suggests Iām failing to check size limits of my input parameters, or that I have an expensive operation in my code
Iām not convinced that the per API limits canāt just be solved by the developer testing and putting in these checks themselves, but the lower overall canister limit would definitely be useful.