Boundary Node Roadmap

Hello everyone,

It’s time for another update from the boundary nodes team. With the new roadmap, boundary nodes have their own milestone: “SOLENOID.” This milestone encompasses the new boundary node architecture and its building blocks. Here’s where we stand today:

Progress:

First API Boundary Node Incoming

Most aspects of the API boundary nodes are complete. ic-boundary has been running in production for over six months, there are enough nodes with IPv4 connectivity, and the integration into the NNS and guestOS is finished. We just have some final touches left and are awaiting the final green light from the security review.

Once approved, we will propose adding firewall rules to allow metric and log scraping, followed by a proposal to deploy the first node. We aim to complete this in June, so keep an eye out for the proposal and be ready to vote!

Discovery Library

We are finalizing the implementation of the discovery library and gradually integrating it into agent-rs. This library is crucial for all IC-native clients that need to communicate directly with the API boundary nodes. For example, the HTTP gateway will be a key user of this library.

ic-gateway

We have made significant progress on the HTTP gateway core service. After some testing, we plan to integrate it into the current boundary nodes to “battle-test” it and gather insights just as we did with ic-boundary. Gradually, we will transition to the new HTTP gateways and retire the existing boundary nodes.

We will continue to keep you updated on our progress. We look forward to your feedback and are here to address any questions you may have!

8 Likes

Crossposting for completeness :slight_smile: :

3 Likes

We are thrilled to announce that the decentralization steps within the SOLENOID milestone are making steady progress. With the adoption of the proposal 130337, the first two API boundary nodes (API BNs) came up live under the management of the NNS and became an integral part of the protocol. Now for the first time ever, IC clients can route their API canister calls directly through these nodes.

While this marks the first big step of the SOLENOID milestone, we are still in the early stages. Below we explain how one can discover the API BNs and send http requests through them.

How can I discover existing API Boundary Nodes?

As API BNs are now part of the IC, they are marked in the registry canister and are also exposed in the system state tree. The following example in Rust allows to fetch all existing API boundary nodes:

/ic$ cargo run --bin fetch_api_bns ic0.app

API Boundary Nodes in the State Tree: [
    ApiBoundaryNode {
        domain: "br1-dll01.aviatelabs.co",
        ipv6_address: "2001:920:401a:1710:6801:94ff:fe18:6b54",
        ipv4_address: Some(
            "213.246.205.106",
        ),
    },
    ApiBoundaryNode {
        domain: "bc1-dll02.blockchaindevlabs.com",
        ipv6_address: "2001:438:8000:1d:6801:c1ff:feac:bc9",
        ipv4_address: Some(
            "208.184.66.5",
        ),
    },
]

Internally a read state call to https://ic0.app/api/v2/subnet/subnet_id/read_state is made asking for the path /api_boundary_nodes in the state tree. All API BNs with their domain names, IPv6 addresses and optional IPv4 addresses are listed under this path.

In the following, we show how one can make use of this information and send requests to an API BN directly.

What can I do with these API boundary nodes?

Now as we’ve discovered the current set of API BNs, we can send API canister calls directly to the IC, thus avoiding the centralized DNS-based routing via ic0.app. Run the previous example with an API BN domain name instead:

/ic$ cargo run --bin fetch_api_bns br1-dll01.aviatelabs.co

The response should of course remain identical. However, now we talk to the IC directly via one of the API BNs https://br1-dll01.aviatelabs.co/api/v2/subnet/subnet_id/read_state.

What’s next?
Soon we’ll extend agent-rs with the ability to automatically discover API BNs and dynamically route traffic to them based on some predefined strategies (e.g., minimizing latency). This would mark another important step towards decentralization.

10 Likes

Excellent news!
There is a lot more to say in the way of praise and encouragement and forward planning but “excellent” will do for now :slightly_smiling_face:

4 Likes

This is exciting!

However, I’m having some issues with communicating with one of the API BNs.

This command works fine, returning the result after ~2 seconds.

This command instead, after ~10 seconds, returns:

Error: An error happened during communication with the replica: error sending request for url (https://br1-dll01.aviatelabs.co/api/v2/status)

Although, this command works, returning the result after ~2 seconds:

/ic$ cargo run --bin fetch_api_bns bc1-dll02.blockchaindevlabs.com

So, I believe there’s something that doesn’t work properly on the API BN at br1-dll01.aviatelabs.co.

1 Like

Hey Luca,

yes, the one node unfortunately is having some issues (unrelated to being an API boundary node). You can see the status here: https://dashboard.internetcomputer.org/node/ag4ro-4ihpg-ozp6d-lbwac-5pksl-3wany-aaayw-ibvuw-uh7wj-lbav5-qae

We have already submitted a proposal to remove this node here and another proposal to add two more API boundary nodes (here).

Sorry for the hiccups.

3 Likes

Hey Luca and thanks for your message! In the world of decentralized API boundary nodes, the health of each individual node is indeed the responsibility of the respective node provider. This time, you encountered an unhealthy node by interacting with it directly. In the near future we’ll extend the agent-rs to route traffic through all dynamically discovered API BNs. The agent will also monitor API BNs health and select the node with minimal latency, thus substantially reducing the probability of dispatching requests to unhealthy nodes. If a request for a certain node still fails by some reason, the agent will automatically retry it with another node, ensuring an overall smooth user experience.

6 Likes

Hello everyone,

Right before the end of the week, we are back with some news from the boundary node team.

Progress:

API Boundary Nodes

Since the beginning of June, we have now several API boundary nodes running on mainnet. So far, they have not received much traffic because everything is still routed through the “old” boundary nodes, but that will change. We have mostly just made sure everything runs smoothly: the API boundary nodes come up healthy, they automatically obtain the necessary certificates from Let’s Encrypt and successfully route the traffic. Also, they have already undergone several upgrades.

Discovery Library

The implementation of the discovery library has been wrapped up and we are close to integrating it into agent-rs (here you can find the feature branch). We have already been testing it ourselves and had no issues so far.

HTTP Gateway

Probably the biggest news is the open sourcing of ic-gateway, the core service of the HTTP gateways. You can find the code in the dfinity/ic-gateway repository. This marks again a major milestone in our work towards the new boundary node architecture. As of now, we are not yet working on “stand-alone” HTTP gateways, but we have replaced nginx and icx-proxy with ic-gateway in the old boundary nodes (see the following PR for details). We have had a canary out for testing for almost two weeks now (see the forum post) and we are gradually switching traffic over to the ic-gateway boundary nodes.

Outlook:

As we are getting closer to having all the pieces of the new boundary node architecture in place, we are focusing on getting it ready for production by making sure we have all the monitoring, all the dashboards, all the alerts set up and also have the necessary processes for disaster recovery in place.

We will continue to keep you updated on our progress and look forward to hearing your feedback and addressing any questions you may have!

9 Likes

Hello everyone,

It is time for yet another update from the boundary nodes.

Progress:

HTTP Gateway

After having open-sourced ic-gateway, we switched the “old” boundary nodes to ic-gateway, effectively replacing nginx and icx-proxy. These boundary nodes have now been running in production for more than two weeks without any hiccups.

As we are gaining more experience and confidence, we are working on the “stand-alone” HTTP gateways, which we will start testing soon.

Maintenance

Besides that, we have invested quite some time in performance and reliability improvements of ic-boundary and ic-gateway. As part of that, we extracted the common “features” into a shared library, called ic-bn-lib (here is the code).

Outlook:

Currently, we are in the process of getting the “stand-alone” HTTP gateways ready for production. And we are continuing to wrap up a few things required for production: monitoring, dashboards, alerting, and disaster recovery. More on that soon!

We will keep you updated and look forward to hearing your feedback and addressing any questions you may have!

10 Likes

Hello everyone,

This time, I have just a short update for you:

We are working on the last bits and pieces: making sure we don’t break anything by switching over to the new architecture (e.g., moving the SOCKS proxy from the boundary nodes to the API boundary nodes) and incident handling. We have posted a proposal on that topic in a separate thread: Incident Handling with the New Boundary Node Architecture

2 Likes

It’s obvious that opening up BN can make rate limiting more controllable. Many dApps send a large number of query calls from a single point (e.g., a single server) within a short period, which sometimes gets rate-limited by the BN. After opening up BN, this limitation might be lifted. However, in the new architecture, wouldn’t it still be subject to the rate limiting of the API boundary nodes? Are there any good solutions for this?

Previously, my method to handle a single server sending a large number of requests to the IC was to use a RoundRobin approach to request multiple IP addresses under the ic0.app domain.

1 Like

Hey @C-B-Elite

Indeed, the boundary nodes have different mechanisms in place to limit the incoming traffic. This is mostly meant as a protection for the boundary node itself and to ensure that not one single clients “hogs” all the resources.

In particular, there are limits at different levels:

  • At the firewall, the number of concurrently open connections per client is limited. If you go above that limit, new connection attempts are rejected.
  • Then, the number of concurrently open HTTP2 streams per client connection is limited, as well as the total number of requests that can be sent over a client connection.
  • Then, there are some limits on the request rate per client.

There is also a loadshedder that kicks in when the nodes come to their limits. Similarly, the replicas in the subnets also have a loadshedder.

In general, these different protection mechanisms will not change with the introduction of the API boundary nodes as they also need to be protected.

What are you trying to do when you face the rate-limits? If you can explain your use-case, we could maybe see how you can work within the limits.

1 Like

Hello Everyone,

As we approach the end of the year, it’s time for another update on the boundary nodes.

Progress

The primary components, ic-gateway and ic-boundary, have been running smoothly in production for some time now. We’re now finalizing the remaining tasks:

Discovery Library

The Discovery Library is being integrated into agent-rs and is already available as an experimental feature starting from version v0.38.2. To try it out, simply enable the feature flag _internal_dynamic-routing. This library enables the discovery of all API boundary nodes and facilitates direct request routing to them.

Incident Handling

Transitioning to the new boundary node architecture requires a fresh approach to incident management. To address this, we’ve proposed introducing a rate-limiting canister (refer to the related forum post and proposal for details). The development of this canister is nearing completion, and we’ll soon propose its deployment.

HTTP Gateway Operations

We are also working on preparing DFINITY-operated HTTP gateways. This includes configuring the machines, automating deployments, and setting up the DNS load balancer.

Outlook

We’ll continue to share updates and would love to hear your feedback or answer any questions you may have!

10 Likes

Resurfacing this thread from 1.5 years ago, where I asked if everything would be complete by year end 2024.

Looks like you guys are pretty much on time. Amazing to see multi year engineering efforts!! If I remember correctly, this architecture was designed following the Nintendo incident in 2022.

While other foundations can only see in front of their noses, what sets Dfinity apart is an ability to execute a long term vision. Will pay long term dividends.

Kudos!

8 Likes

Hello everyone,

It has been a while since you have last heard from us and quite a lot has happened in the meantime: in short, the edge of the Internet Computer is now fully decentralized :tada: But let me explain it in a bit more detail:

Progress & Current State

API Boundary Nodes
After successfully setting up the rate-limiting canister through proposal #134775, everything was ready for the wider roll-out of the API boundary nodes.

Since the adoption of proposal #134902, there are now 20 API boundary nodes distributed across the globe: 4 in US West, 4 in US East, 6 in Europe, 5 in Asia, and 1 in Africa. These API boundary nodes are operated by different node providers and fully NNS-managed (additions, removals, and version upgrades happen through proposals).

The API boundary nodes are ready and waiting for your requests. This allows IC-native clients to talk directly with the API boundary nodes bypassing any centralized infrastructure. This brings me to the topic of the discovery library, which provides all the necessary tooling.

Discovery Library

We have integrated the discovery library into agent-rs. You can simply instantiate the agent with the with_background_dynamic_routing and a seed URL to bootstrap the discovery. The agent will fetch the list of available API boundary nodes using that seed URL. Then, in the background, it continuously monitors the health of these nodes and automatically routes the API requests to the closest healthy API boundary node. You can find the documentation here. The HTTP gateways, for example, rely on this feature.

HTTP Gateways
With the wide deployment of API boundary nodes, we could finally set up the first DFINITY-operated HTTP gateways. After internal testing and validation, we started replacing some of the “legacy” boundary nodes with HTTP gateways. Currently, we have in every region (US West, US East, Europe, and Asia) one HTTP gateway running alongside the legacy boundary nodes. We are monitoring the situation and will gradually switch the remaining boundary nodes to HTTP gateways.

ic-gateway, the service powering the HTTP gateways, has been opensourced (GitHub repository) and can be used by anyone to run their own HTTP gateways.

Resources

Check out the blogpost on the Solenoid milestone, our update in this month’s Global R&D, and our demo on how to use the discovery library within agent-rs.

Conclusion & Outlook
The design and implementation of the new boundary node architecture has been a major undertaking for the boundary node team and has been almost two years in the making. Along the way, we improved a lot of the tooling and processes, the observability (visibility into the state of the IC’s edge and alerting, and of course the reliability, scalability and robustness of the different services. It’s great to see that (so far) everything works as intended.

The completion of this milestone does not mean that the boundary node team is out of work: once we have switched all “legacy” boundary nodes to HTTP gateways, we will clean up some of the remnants (e.g., the firewall rules allowing the “legacy” boundary nodes to directly talk to the replicas) and then take on the next projects.

16 Likes

Really impressed with the ‘Discovery Library’—it looks like it’s bringing Edge capabilities. Will other agent libraries need updates to support these features? @neeboo @rdobrik

Regarding ‘HTTP Gateways — Enabling Direct Browser Access,’ does this mean we can interact with our canisters directly via browsers or HTTP clients, such as calling http_request methods inside our canisters? Specifically asking in the context of caching large assets like videos, where caching happens on boundary nodes before responding to the client.

2 Likes

Just a question: why is the with_background_dynamic_routing method marked as async in the agent-rs’ AgentBuilder?

1 Like

That sounds like a mistake to me. Fix PR

2 Likes

Are there plans to put this same kind of discovery in the agent-js?

1 Like

Hey @skilesare

Currently, there is no such plan, but if there is a lot of interest, we can look into it. I can quickly explain why we decided to not integrate it into agent-js:
From our viewpoint, agent-js is mostly used in the browser by the frontend of a dapp (e.g., to query your balances in a wallet).

  • Too heavy for a frontend in a browser: The discovery library in agent-rs, runs background threads to periodically fetch the latest list of API BNs (this is simply a read_state) and to continuously health check all API BNs (including latency measurements). We felt that this is too “heavy” work to run in the frontend of a webapp.
  • Need for dynamic content-security policies: In the frontend, you usually want to set CSP headers to restrict connect-src (this encompasses requests made by agent-js). This is simple with the “legacy” BNs and the new HTTP gateways: you just need to allow icp-api.io or one of the other domains. The API boundary nodes, however, all have different domains and the set of API BNs changes. Hence, you would either have to completely open your CSP or adapt it dynamically based on the currently active API BNs (e.g., have a regular job in your canister that gets the latest list from the registry or something else).

Now, if you use agent-js in a non-browser setting (e.g., node.js), these points don’t apply. Does that make sense? Let me know what you think!

1 Like