Researching The Use and Impact of Delay Tolerant Networking on the Internet Computer

Kiwiki · January 23, 2022, 5:39am

Delay Tolerance for the Internet Computer

Hello, dearest reader!

I have sent a few messages in the ICPMN Telegram about this topic, and was recommended to ask a few questions here. I’ll start with the premise of the idea.

Introduction

Basically, I have access to a datacenter for my courses and we are to pick a project and implement it in the datacenter. I aim to implement a delay tolerant network in the datacenter, and the idea of how the Internet Computer would work over said network arose.

What is Delay Tolerant Networking?

The initial conception of delay tolerant networking arose due to the fact that communication between planets has significant latency and is in certain cases completely impossible. It is used by the likes of NASA for communicating with the Mars Rover, Voyager, and the likes. These are not the only uses for it, and there are Terra to Terra communication use cases such as remote villages using various methods to send and receive data. It relies on opportunistic links and scheduled links to transmit data. If a link is not available, the data will be held until it is and transmitted when possible. Standard protocols like TCP and UDP are unreliable and sometimes completely unusable in these conditions, therefore DTN and Bundle Protocol was concepted.

What does this have to do with the Internet Computer?

The way I see it, the Internet Computer is, down the line, going to be communicating with destinations such as Mars, or even further. The way it works now, as I understand it, does not take this into account. We have decades before this becomes a significant problem, but I do not believe it would hurt anything to begin researching how this will work sooner, rather than later. And once the research is in progress, I have faith that terrestrial uses will pop up as well.

My Questions

First and foremost, does anyone else think this is a viable research opportunity?

Secondly, would it be possible for a node on the network to be flagged as a research node? This datacenter is student ran, and may face downtime. Not only the downtime, but the node will likely be going up and down constantly as the research is carried out. The node as I see it would have to be flagged as a research node that should not be used for essential services. I believe this would require a vote to be ran to add a new node distinction.

Third, could there be some leniency on the node hardware requirements? This applies if the answer to the fourth question is no. I am not entirely sure if the datacenter has a PowerEdge 6525 or 6515, nor a SuperMicro 1023US-TR4. If it does not, would it be possible to run the node software inside of a Virtual Machine or other available hardware for the sake of this research?

Fourth, are there any research grants available for the Internet Computer? If so, this would allow the research into this to be carried out even once I finish my courses and lose access to the datacenter. I am also not sure if this research would be able to be done in the datacenter available to me, and if a grant is available the ability to perform this research in a proper datacenter would be a reality. It would also enable the research to be able to be performed on the standard hardware as required by the IC as is, without needing leniency on the hardware requirements.

Conclusion

The Internet Computer could benefit greatly by researching integration with Delay Tolerant Networking stacks sooner rather than when it is properly required. The opportunity is in our hands to solve the issues that may arise before they come up, and we can lay the foundation for interplanetary blockchains going forward. This research may not be immediately impactful on the Internet Computer, but by laying the foundations for it, we can save headache going forward as the IC becomes more complex and widespread.

👽☢️♻️♋👽

jzxchiang · January 23, 2022, 6:12am

The way I see it, the Internet Computer is, down the line, going to be communicating with destinations such as Mars, or even further.

faraz · January 23, 2022, 7:06am

Kiwiki:

flagged as a research node? This datacenter is student ran, and may face downtime. Not only the downtime, but the node will likely be going up and down constantly as the research is carried out. The node as I see it would have to be flagged as a research node that should not be used for essential services. I believe this would require a vote to be ran to add a new node distinction.

Third, could there be some leniency on the node hardware requirements? This applies if the answer to the fourth question is no. I am not entirely sure if the datacenter has a PowerEdge 6525 or 6515, nor a SuperMicro 1023US-TR4. If it does not, would it be possible to run the node software inside of a Virtual Machine or other available hardware for the sake of this research?

Fourth, are there any research grants available for the Internet Computer? If so, this would allow the research into this to be carried out even once I finish my courses and lose access to the datacenter. I am also not sure if this research would be able to be done in the datacenter available to me, and if a grant is available the ability to perform this research in a proper datacenter would be a reality. It would also enable the research to be able to be perfor

Need a lot more info to offer you any useful suggestions. Just some high-level thoughts on how DTN and IC would interact in the context of the bundle protocol.

First and foremost, does anyone else think this is a viable research opportunity?

Depends on what are you trying to do. IC uses TCP/IP and such has no consideration (good or bad) for networks with long delays and disconnectivity (esp interplanetary networks earth->satellite->Mars). In presence of extremely large and many delays disrupting networking for the majority of the network, IC behavior is to “keep trying” to make progress. As soon as the network heals IC will make progress. The P2P layer is modular and can be swapped out for one that is amenable to DTN.

The following experiment might work theoretically

4 node subnet, 1 on earth 1 on mars, and 2 on intermediate satellite. Assuming static connectivity, the IC has the equivalent of the store and forward mechanism in a way the artifact manager is implemented. The P2P layer implements the push-pull model with adverts, for conserving bandwidth. With bursty connectivity, an unconditional push model may eliminate the RT.

A similar but tangible project is to model running the IC on multiple low orbit satellites like SpaceX’s Starlink.

Secondly, would it be possible for a node on the network to be flagged as a research node?

Getting into the node provider program has a bunch of requirements. Once you have access to a node. you are free to have a field day with your node! IC provides 3f+1 fault tolerance and it runs fine if faults are within bounds. But for this experiment run your own 4 node subnet, the code is open source.

Fourth, are there any research grants available for the Internet Computer?

Yes, https://dfinity.org/grants/ Although this project sounds more open-ended than the projects funded via dfinity grants. (my opinion)

faraz · January 23, 2022, 7:10am

read your original post about Starlink. I think IC should run just fine if hosted on Starlink satellites. Networking and latencies won’t be the first-order of problems

faraz · January 23, 2022, 7:19am

github.com

dfinity/ic/blob/0c19683cf312073168cb1de4f81ef0324212f7e2/rs/transport/src/data_plane.rs#L58

    
      
          // less responsive to queue clearing. A good compromise size might be 32K with a
          // larger queue size.
          /// The number of bytes which will be attempted to dequeue and aggregate before
          /// sending to the network
          const DEQUEUE_BYTES: usize = 100 * 4 * 1490;
          
          
// Payloads are received/collected in units of SOCKET_READ_CHUNK_SIZE
          /// Size of read chunks
          const SOCKET_READ_CHUNK_SIZE: usize = 32 * 1024;
          
          
/// Heartbeat send interval (timeout on sender side)
          const TRANSPORT_HEARTBEAT_SEND_INTERVAL_MS: u64 = 200;
          /// Heartbeat wait interval (timeout on receiver side)
          const TRANSPORT_HEARTBEAT_WAIT_INTERVAL_MS: u64 = 5000;
          
          
/// Error type for read errors
          #[derive(Debug)]
          enum ReadError {
              SocketReadFailed(std::io::Error),
              SocketReadTimeOut,
          }

Kiwiki · January 23, 2022, 4:31pm

One I can conceptualize right now is for things like cannister calls. Say your link is down. When you make a call to a cannister, it will not ever reach the cannister AFAIK. What DTN could do in this case is allow you to issue the cannister call, and then the DTN stack would hold that call until the link comes back online. Once it comes back online the call is transferred to the receiving end, and your end drops the call from the stack once it’s confirmed it’s been received by the receiving end. Then the receiving end sends it further down the line via whatever link it has, holding it until it receives confirmation it has been received by the next stop. Once it reaches the IC subnet, IC processes the call, and issues a response. If the return path has gone down at any point along the line, the response doesn’t just end up in the void, it will slowly make its way back to you once the links come back online. This is more on the clients networking side than nodes.

Ie: Client on Mars sends a call → Mars orbiter receives call and holds until link with Earth orbiter comes online → Mars orbiter eventually establishes link with Earth orbiter → Mars orbiter transfers call to Earth orbiter and holds call until Earth orbiter confirms receipt, resending on next link if Earth orbiter does not → (At this point a few different things begin happening) (a.) Earth orbiter receives call and attempts to send receipt confirmation to Mars orbiter → (b.) Earth orbiter sends call to IC and holds call until IC confirms receipt → (a.) Mars orbiter receives receipt, drops call from stack, and notifies client their call has been received by Earth orbiter. (b.) IC receives call and processes it, as well as notifying Earth orbiter it has received the call so it can be dropped. → (b.) Once processed, IC sends response to Earth orbiter → (b.) If the link to Mars orbiter has gone down, Earth orbiter holds response until next link → (b.) Once link is available, Earth orbiter sends response and waits for Mars orbiter to confirm receipt before dropping, resending on next link if not confirmed → (b.) Mars orbiter receives response, sends receipt confirmation, and sends response to Client. → (b.) Earth orbiter receives receipt confirmation and drops response.

The above assumes only 1 Earth orbiter and 1 Mars orbiter, and the entire subnet being on Earth and the client on Mars. There could also be other systems that receive the call as well, and it becomes a race between which gets to Earth first, and then another race between which returns to Mars with the response first. For example, you could have a shuttle carrying the call as well. If for some ungodly reason the Earth orbiter got destroyed before you sent the call, it will still eventually make its way to Earth via the shuttle. And once another link to Mars becomes available, or the Shuttle returns to mars with the response, the client receives the response.

On the actual node side of things, the way I understand it is code executes on every node in the subnet and then the nodes come together and form a consensus on what the outcome of that code is given their initial pre execution state. Lets assume a 2 node subnet for this example. Node 1 is on Earth, node 2 is on Mars. Lets assume a client issues a call to a canister in the subnet from exactly half way in between the two locations. Both nodes receive the call at the same time, and begin executing. Once execution is finished, the nodes need to establish consensus. Using the 2 orbiter model, lets start with the link between Mars and Earth offline. Node 1 sends its final outcome to the Earth orbiter, and receives confirmation that the orbiter received it. Node 2 sends its final outcome to the Mars orbiter, and receives confirmation that it has been received by the orbiter. When the link between the orbiters comes online, each orbiter sends the other orbiter the final outcome from their respective node, and holds it until the other orbiter confirms receipt. Orbiters receive the other sides outcome, send out a confirmation, and send it to their respective nodes for consensus. Once each orbiter gets confirmation from the other side that the results have been received, they drop them. Lets say the consensus doesn’t happen, Node 1 got a 32 for the result, and Node 2 got a 16. Each node would send a response to their orbiter that consensus was not achieved and that a redo is needed. When the link is available, that response gets sent to the other side and dropped when confirmed. Execution happens again. Repeat until consensus is achieved.

DTN has the capability to work with TCP/IP in a way. I will look into seeing how this would work with what IC does.

The thing is, does this method work, and work efficiently, in the case of +300s one way trip times? The timeouts in the code you sent were 200ms and 5000ms, which is very much a terrestrial delay.

I don’t believe nodes on satellites would be entirely necessary, and possibly overkill. 2 nodes on Earth, 2 nodes on Mars, and 1 Earth orbital communications link and 1 Mars orbital communication link sounds like a more realistic approach. I understand this is a theoretical experiment, but still.

Does this work if links are non-static, occasionally opportunistic, and extremely unreliable?

Very good point. However, low orbit is still vastly quicker in latency than interplanetary.

As in, I could run it in a Virtual Machine? No need for hardware, or connecting to the actual IC?

I agree with your opinion, and those grants appear to be for developers rather than research such as how a new networking stack would play into the IC.

Kiwiki · January 23, 2022, 4:32pm

That actually wasn’t my post It was very convenient that it popped up in the feed and that someone had raised basically this exact scenario in the comments.

Topic		Replies	Views
IC's Starlink subnet Roadmap	3	1028	January 23, 2022
Alright, what's the update on nodes? DFINITY	4	848	May 11, 2022
Internet Computer in Space? DFINITY	3	1406	September 4, 2024
Voting is now live for a new proposal for the scaling of the world-computer NNS Governance	24	681	April 21, 2025
Data Processing Subnet Developers	3	1099	May 8, 2023