Scalable Messaging Model

The NNS motion proposal is live, voting is open for the next 4 days.

4 Likes

I agree with this change and endorse it.

I come from a Web2 experience doing a lot of integrations, and a lot of the “pain” came from handling all the possible errors, edge cases. We could never “prepare well enough”, it was crucial to have proper alert / logging systems.

For the “best effort” system to be a success, we definitely need the “canister logging on traps” and proper handling of these fails (sleep + retry nr as mentioned).

Please kindly prioritise these before the release of wide changes in networking. :pray:

4 Likes

There is already work in progress to preserve and expose logs, with explicit coverage for traps.

For alerting, you can expose standard Prometheus metrics via an HTTP endpoint (e.g. here are the NNS governance canister metrics). The only thing you should be careful about is to explicitly attach timestamps to every sample, so if you hit a replica that is significantly behind the rest of the subnet you get a gap instead of an out-of-order sample.

2 Likes

@here - if you’re interested in this topic, we’ll have a presentation and discussion about it this Thursday.

1 Like

Hello everybody,

it has been quite a while since we shared the last update on the new messaging model so we thought we’d provide a quick update on the progress of the currently ongoing implementation of messages with best effort responses.

  1. The system API changes required to expose the new message type to canisters are done and hidden behind a feature flag.
  2. On top of this, we plan to expose the feature in CDKs behind a similar feature flag for early developer feedback already before the feature will be available on mainnet.
  3. The core changes to support best effort responses are also progressing well. This includes (quite fundamental) changes to canister queues and other related data structures to support the new message types. The strategy here is to develop data structures that are functionally equivalent to the current canister queues but also support the new message types. They exist in parallel to the current ones but remain unused until everything is sufficiently tested. Then there will be a switch from the old to the new ones. This way things can gradually go to master. So watch out for queue related changes in case you’re interested to follow the progress.

Finally note that 2 is not blocked by 3, so if everything goes according to plan canister devs will be able to prepare their canisters and provide feedback even before the feature implementation is fully done.

5 Likes

Hi folks, we are (or rather @free is) getting close to finishing the implementation of the new messaging model. While it will still take a while to conduct experiments and tune some parameters before this can go live on mainnet, the main functional changes should be merged to the replica master very soon. We’ll give an update in the Scalability & Performance WG coming Thursday, with:

  • A recap of the model
  • A status update
  • A demo of how you can start experimenting with it
    Finally, we want to spend most of the time on a brainstorming session on what new use cases the new model enables, apart from just lower costs and more scalability. Please join if you’re interested!
4 Likes

Hi all - excited to share the good news: best-effort responses are available for experimentation in dfx, and we’re excited to see what you folks can build with this feature!

Dfx version 0.24.1-beta.1 includes a replica version with best-effort response support. You can install it using dfxvm:


dfxvm install 0.24.1-beta.1

DFINITY will be adding documentation on this new feature to the Internet Computer website soon. For the moment you can check out these demos, written in Rust, since the Rust CDK already offers support for this feature on its next branch:

We hope to also have Motoko support soon.

The feature is currently still disabled on mainnet. There are a few other pieces that need to be in place before it’s safe to enable this in production, but we wanted to enable the community to experiment with the feature as soon as possible. The two main immediate benefits of using best-effort responses are:

  1. It’s safe to call out to untrusted canisters using best-effort responses. The callee cannot make your canister wait for a response forever, which means that they cannot prevent you from stopping and safely upgrading your canister.

  2. It’s predictable how long a canister call is going to take, making it easier to build responsive applications.

To finish off, let me also list a couple of cases where one has to be careful using best-effort responses:

  1. If the call makes the callee perform state changes (e.g., transfer tokens), then care must be taken to ensure that either the call can be safely retried, or that its results can be obtained. Note that this is similar to what should currently be done for ingress messages; see the documentation on safe retries for more details. Any kind of read can be safely performed with best-effort response calls, and, if necessary, safely retried in case of timeouts, since reads are idempotent.

  2. Cycle transfers. If timeouts occur when depositing cycles to another canister, or when cycles are added to the call, the cycles will currently be lost. While this is something that we want to improve later on, given enough interest, cycle-transferring calls should currently stick with the guaranteed response model, or limit themselves to small amounts where potential losses are acceptable.

If you have concrete plans to use best-effort responses in your app, we’d love to hear about them!

11 Likes

Mainly thanks to @free and Christian the implementation work on best-effort responses is now complete! We’re still doing some testing and benchmarking work, and if everything goes well we plan to make a proposal to enable the feature on mainnet beginning of next year.
@lastmjs , would you consider adding support for this in Kybra/Azle? I’m happy to help answer any questions. In addition to this thread, the two most relevant resources for a CDK are probably:
the spec change and the new documentation

10 Likes

Of course we will add support to Azle with higher priority, Kybra we might wait a bit.

4 Likes

Does the Rust CDK have support yet?

1 Like

Yes, it’s on the next branch,

1 Like

Awesome, great to hear! Please also let us know if questions come up along the way. Always happy to discuss.

2 Likes

Note that the upcoming support in the Rust CDK is still somewhat in flux (in particular, the error codes will soon be overhauled).

2 Likes

We will wait until the functionality is stable in the Rust CDK

I saw there was an addition in the latest version of motoko to support the api call.

Could we get an example of what it will look like to use?

1 Like

So far I’ve only created some Rust examples linked above, since Motoko didn’t have support yet - the examples are still to be polished and will be included in the examples repo, I’m working on it. @ggreif has a draft PR for Motoko with some usage examples

3 Likes