Awarded: ICDevs.org Bounty #39 - Async Flow - One Shot - Motoko - $6,000

AFAIK inspect message is only for agent → canister calls (i.e. external calls), it doesn’t work on inter-canister calls. We’ll need to implement firewalls / accesslists at the fn level.

1 Like

A good thing, and a possible solution for avoiding resource drain DoS is that Notify supports sending cycles. So we could add support for verifying cycles, and only accepting NEW messages that include x amount of cycles (calculated by each developer to cover the case of processing the request and sending an ACK w/ a set number of retries, averaged over y amount of time).

Great idea. If they want the function accessible via ingress they can have a traditional endpoint.

Hi,

I’m commenting here since we’ll need to sync anyway. I’ve mapped the flows, and believe I’ve touched on every step involved. Please let me know if I missed anything.

Brief flows overview

Outgoing

  1. Encapsulate a new message, store payload, etc. Return a msg_id

(can fail due to canister memory, etc. Ignored for the purposes of this lib)

  1. Send Notify(NEW{msg_id, payload})

(Failure1: the notify call can fail if the canister’s queue is full)

[Need to re-attempt this call after a set timeout, with a set retry count]

  1. Waiting for ACK

(Failure2: the ACK does not arrive after a set timeout)

[Need to re-attempt step2, with a set retry count]

  1. Received ACK, call processing fn

(Failure3: the processing fn call traps / panics. Ignored, should be handled by the other canister)

  1. Send FIN

(Failure4: the notify call can fail if the canister’s queue is full)

[Need to re-attempt this call after a set timeout, with a set retry count]

  1. Wait for a set timeout while maintaining the state

(we need this step in case the other canister doesn’t receive a FIN, and re-sends the ACK. We could simply drop the state at this stage, and reply with FIN to any unknown msg_id. Implementation decision / unsure)

Incoming

  1. Receive a NEW message. Create an entry, store payload, decide to accept, call processing fn.

(can fail due to canister memory, etc. Ignored for the purposes of this lib)

(Failure1: the processing fn call traps / panics. Ignored, should be handled by the other canister)

  1. Processed (Result). Send ACK

(Failure2: the notify call can fail if the canister’s queue is full)

[Need to re-attempt this call after a set timeout, with a set retry count]

  1. Waiting for FIN

(Failure3: the FIN does not arrive after a set timeout)

[Need to re-attempt step2, with a set retry count]

  1. Received FIN. Mark task as complete.
1 Like

Hi, colleagues

I have a question about the 5th timeout in IC.

You write here:

Where can I find out more about this?

I would like to continue my reasoning further. Maybe I need to set my own five-minute timeout to send to the recipient (in case the confirmation didn’t arrive). Maybe this is the meaning of the lines above?

Note that it is technically not correct that messages will time out after 5 minutes on full queues.

It is true that requests will time out if they are sitting in queues for 5 minutes.

However, a request that was sent out does not necessarily sit in the queue for that long. Messages in (canister-to-canister) output queues are routed into (subnet-to-subnet) streams as long as there is space in the stream. Only once backpressure from the respective stream builds up messages will remain in queues. Messages in streams can no longer time out.

Given that there is no way for the canister to know whether or not a message made it out of the queue into the stream one can not simply conclude that it timed out after seeing no reply for 5.5 minutes. However, if a message times out a “message timed out”-response to the request will eventually (potentially much later than 5.5 minutes after sending a request if the system is badly backlogged) arrive. This response could be the trigger for trying to resend. When resending earlier one might run into a situation where a requests arrives twice (unless there is some explicit deduplication done by the canister).

1 Like

A scenario matrix may be in order.

1 Like

Thanks, derlerd-dfinity1

It turns out like this:

The request from the container goes through 2 states (queue → stream).

The 1st state is that it is in the queue

The request is queued for 300 seconds.
Two options are waiting for him:

  • within <= 300 seconds he gets into the stream
  • 300 seconds have passed. This is discarded. He will never get into the stream.
    No more news from him (not from the IC system, not from the canister being called)

The 2nd state is that it (the request) got into the system data exchange flow

There is one possible outcome that awaits him:

  • 100% processing of the addressed canister. The execution time is unknown.

It looks something like this.

The protocol doesn’t provide any guarantee that a destination canister 100% processes a request. The only guarantee the protocol gives is that every request will get exactly one reply. However this reply can also be system generated in case the request can not be delivered/processed (OOM, queue full on the receiver side, destination canister trapping… etc.). So what you can rely on is that you will eventually receive a reply that will tell you what happened to the request.

This is also true for state 1. If a request times out in the output queue the canister will get a system generated reply that the message timed out. So this statement is also not correct:

To sum up: it doesn’t really make sense to distinguish between the two states you sketch above from a canister’s perspective as there are no delivery guarantees in any of these cases. In both states the only thing you can rely on is that you will get a reply which is gonna tell you what happened to the request. Based on this one can then make a decision on whether or not to retry.

1 Like

Since we’re working with Notify here, the canister that issues a Notify call doesn’t even get that, correct? The only thing we know from the sender canister is that the Notify was successfully added to the outgoing queue or not (the Notify call can either succeed or fail, in sync mode).

Then there’s the question of how often shall we re-try a call that wasn’t answered (via another Notify from the other canister). My intuition here is that our lib should implement something like retry with back-off. First retry at x seconds, then x*2, etc. for a number of retries and then just give up on the call.

Since we’re generating a unique ID for each new flow, we should be OK even if sometimes 2 messages reach a canister. The lib should cover the case where an identical message_id was received, and not re-process the message, instead re-issue the ACK (as described by Austin in the main bounty proposal).

1 Like

I’m not familiar with notify but I assume that notify will just pass invalid callback IDs when making the calls, right?

If so, keep in mind that notify will make the canister not see the reply to a notify but this doesn’t mean that there is no reply. The system still makes a reservation for the reply and the notify will consume a slot in the queue until that reply (which may be system generated if the notified canister doesn’t reply explicitly) arrives. It is just that when consuming the reply the invalid callback ID will make sure that no canister state is changed. So if you retry too aggressively you will end up filling up your own queue and enqueuing new requests/notifys will eventually fail.

This is what I was thinking. I think the question is that with the 5 minute queue, should the first retry happen before or after the 5 minutes(give or take some time).

Hi!

I realized here that there is a flaw in the com_asyncFlow_fin (or com_asyncFlow_ackack) function Since it is sent once (without resending attempts), I think this flaw will need to be fixed. But that’s not the point. I realized here that the sending attempts themselves (if there is no confirmation) load the network. Let’s say 10 attempts to send a new message and 10 confirmation attempts load the network (10+10)/2 = 10 times!. The two here is the count of the usual exchange. The way out of the situation I see so far is this: the 1st attempt is single, perhaps the second attempt is single. Further confirmation attempts( let’s say from the 3rd to the 10th) must be sent in bulk (i.e., using Data Collections (List) ). That is, send multiple confirmations in one request

To my understanding, the purpose of this library is to allow canisters to communicate with 3’rd party canisters without worrying about malicious actors that can render your own canister un-upgrade-able. For context, check out this post. It is meant as a stopgap until a permanent solution to the upgrade issue is implemented by Dfinity.

Using this library will be ~3x more expensive, but it will enable untrusted 3rd party communication today. Some people might find the tradeoff worth-while.

I understand. I’m also leaning towards the main option. But there is an option to optimize the number of requests. I think after writing the main library, it will be possible to think about improvements.

Having a batch endpoint is an interesting proposition.

Hi! skilesare

I have a ready-made implementation. She was ready a couple of days ago. But there are packaging problems. Link to github

Now about the problem: the actual asynchronous data exchange in this library excludes the use of async — await operators. Using these operators will result in waiting for the result to be returned. My code in the implementation of actors (Sender — Receiver) does not use them. Since they are immediately created in canisters during assembly.
Of course, the main task of the library is to hide (encapsulate) the work of the library and simplify its use. Therefore, I wanted to take out the main logic of the work, that is, the code into a separate class or module. But in this case, the compiler (SDK) requires the use of asynchronous functions.

For example, a section of code in the sender’s canister (not packaged):

canister_receiver.com_asyncFlow_newMessage(msg);//NEW 

But if this code is packaged in a class or a separate module. Then it turns out such a construction:
(incorrectly)

class SourceSender(){
       public shared({caller}) func com_asyncFlow_newMessage(msg: MessageType): async(){
       //****//
       await canister_receiver.com_asyncFlow_newMessage(msg);//NEW 
 }
}

I will think about how to get away from asynchrony in the class. Perhaps the callback functions will help in this.
Update 1 (the callback function also requires asynchrony in the function parameters).
What are your opinions on this?

I think this requirement is expected. You may want to use async* in your library so that if the library cancels the send for some reason it doesn’t actually cause an await.

That’s right, I’m not using async-await* at the moment. Everything works fine without async-await. But this is in Sender-Receiver cans, but if the code is packaged in a separate module, the SDK (compiler) forces you to add code with async-await*, otherwise calls between containers are not compiled.

I’m still considering exit solutions out of the problem. I thought the callback functions would help, but they also eventually require the async-await construct.*

I also realized that the created class (in the actor) and even the module (in the actor) will be interpreted by the scanner and/or compiler as containers and require an asynchronous async-await* construct.

Update 1 I think I’ve found a solution, but I’m not 100% sure