Enable canisters to make HTTP(S) requests

We are thinking of how a system API in the Management Canister could look like for a first MVP implementation of the feature. In such first iteration, we are thinking of just allowing for HTTP GET calls. If responses may differ, the canister can provide a method to transform it. This is useful to, e.g., account for timestamps, transaction ids, and the like, as commonly found in API calls. This transformation happens on every replica of the subnet to account for different responses received by each replica. Post transformation, all responses should be the same and can go into consensus.

This MVP would already be pretty useful for lots of applications.

type HttpHeader = record { 0: text; 1: text };

type HttpResponse = record {
  status: nat;
  headers: vec HttpHeader;
  body: blob;
};

type Error = variant {  
  no_consensus;
  timeout;
  bad_tls;
};

service ic : {
  // A new method to be added to the IC management canister.
  http_request : (record {
    url : text;

    // Support for other methods like post/put would be added here.
    method : variant { get };

    headers: vec HttpHeader;

    body : opt blob;

    transform : opt variant {
      // The name of a wasm method to transform the response.
      // The method signature must be: `(HttpResponse) -> (HttpResponse)`
      // and must be exported by the canister.
      wasm_export: text
    };
  }) -> (variant { Ok : HttpResponse; Err: opt Error });
}

Feedback welcome!

6 Likes

Not sure about the semantics(and/or correctness) of such transformations. imagine a simple thing like time in replicas.

For example: If the time is stored internally through a system call made within the replica at the time of a update, any transformation that goes through reasoning abou time (i.e. all records within a time range) may report inconsistent results across replicas. Thoughts?

The answer/pattern here is pretty important even for things outside of this specific topic. For example, in my file-system on top of stable memory there is no concept of time as being local to replica.

1 Like

The transform field might be a great application for Candid’s support to reference functions, so if you put in

transform : opt (func (HttpResponse) -> (HttpResponse) query)

you get type checking there and the guarantee (kinda) that the exported function is a query.

This would allow referencing other canister’s query functions, which may seem a bit odd, but is actually nice from a decoupling and composition point of view. And performing a possibly remove query shouldn’t be too hard (or you just fail in that case, if you don’t want to deal with that).

If Candid already had Generic data, then something like this would be even better:

service ic : {
  http_request : <T>(record {
    url : text;
    method : variant { get };
    headers: vec HttpHeader;
    body : opt blob;
    transform : func (HttpResponse) -> (T) query
    };
  }) -> (variant { Ok : T; Err: opt Error });
}

After all, there is no reason why the transform function needs to return a HttpResponse, and not already something that has been parsed into application logic.

@rossberg can probably comment on these composition and function reference issues.

3 Likes

@nomeata : Thanks for your suggestions on the API, Joachim!

My text did actually not provide enough detail here. So, let me give some more information now: The idea of the transformation function is that is must result in the same response on each replica in order to allow for consensus. The common cases we have in mind is that we know the structure of the response, e.g., a JSON object from an API call and implement the function in a way that “cuts out” some parts of the JSON, e.g, time stamps, ids etc. from the response that may differ between different responses to obtain the transformed response. Such transformations are simple structural transformations based on a priori knowledge of the structure of the response. Particularly, such transformations would not need to reason about time in any way in the replica, as doing so would clearly be very reliable source of non-determinism.

I don’t know whether I am missing a specific class of use cases you have in mind here, but we think that a key use case for the transformation is to remove fields that are known to differ between different responses of the same call. Think of exchange rate APIs as one example of this.

If you want all records within a time range, as you mention in your post, and different responses may contain different records, we would need to provide the time bounds as input to the transformation. Would such additional parameters something we should consider? If so, we would need to somehow provide parameters in a generic manner.

3 Likes

This would be transparently and generally possible, with the simple interface that I provided above, if and when Candid (and the platform) supported closures: https://github.com/dfinity/candid/pull/291

It may be tempting to come up with ad-hoc solution for this instance of the problem of higer-order function composition, but then the next API has to solve them again. It may be better to solve generic issues once and for all at a lower level.

(If only we had been bold and relentless on the level of the programming model back then, then we might have closures on the system level now already - right @PaulLiu?)

3 Likes

@nomeata Thanks for the suggestion. It’s not clear to me how how reference functions would be used in, say, Rust. What would the caller pass in the payload? The function index?

The more difficult problem is when updating state in external services. Either the service can handle the same state-changing call arriving multiple times, which is easily doable when building such a service with blockchain clients in mind, but current standard services do not account for this

Couldn’t consensus elect one replica to send post/update requests and have measures to make sure it can only send content signed by all the replicas.

1 Like

The value representation of func … → … is the principal of the canister to call and the name of the method – see Func in candid::types::reference - Rust

So operationally not much difference, but the interface describes more of the intent, and you get some type checking (how much type checking depends on your CDK)

1 Like

There are options on how you can do state-changing POST calls to external services. A future extension of the feature would be to allow to select to have a reduced quorum, e.g., only a single replica make a call, with the implications on security. This can be followed by a GET request to validate that the POST has had the desired effect. This is not a 100% generic solution as updates might not be straightforward to validate, but can solve lots of practical use cases.
Another option is that the external service is aware that requests can be made from a blockchain with its trust assumptions and that there are multiple calls for the same request and only process the request once and only if sufficiently many replicas have sent it. This does not work with standard HTTP services, though.
The problem with one replica is that you have no control whatsoever what this replica does, so it seems hard to achieve something that is secure on the one hand and has only one replica sending something on the other hand. A post-update validation seems to be a somewhat clean approach.
Do you have some more concrete idea in mind how the approach you sketched could work? Would be interested to hear about it!

2 Likes

I didn’t have a concrete idea as the internal way the replicas communicate, I havent looked at and so maybe this is not feasible. But the general idea was something like:

  • Canister in every replica will do the post request. This will be caught by the IC system.
  • Having POST requests from all the replicas, the system will come to a conclusion through consensus on what is the right post request to send and which replica will actually send the post request.
  • The selected replica will send the request. All of these happening at the system level and I was hoping there is no way, the replica can change the request at this point. But as you say, we may not have any control what the replica does and the replica could change the request right before sending.
  • On receiving the response the selected replica will again send the response back to all the replicas

Basically, in overview the idea was that subnet decides what is the right post request to send through consensus and chooses one replica to send it. There is overhead of consensus …etc and so its not efficient compared to the other solution.

1 Like

The overhead we would have would not be the problem. The main problem with this approach would still be that there is no way to know whether the replica designated for sending out the request does the right thing, e.g., it could send a modified request or not send a request at all. A compromised replica could do anything it wishes.

The easiest approach for making a state-changing post call would be to simple select one replica to make it and then make a full-quorum call by all replicas to validate that the server’s state has been changed accordingly as outlined further above. This requires that this kind of verification is possible, but, if it is, this is a viable way to do posts with those HTTP servers nicely. Note that, though, the feature of only one replica sending a request is a future extension and will not be readily available. However, it seems to be something we should prioritize as it can help solve quite some use cases.

2 Likes

Firstly, this feature is a game-changer that I can’t wait to use.

To solve an immediate need I hacked together a proof of concept/workaround that uses a “Web2 Bridge” canister to receive HTTP requests and a node.js server that polls, processes, and returns the results. (I’ve only tested it on the local replica so far). Sharing here in case anyone else is in the same situation.

Also, Is there an IC equivalent to thread::sleep? I’m using an iterating counter that goes up to 2 Billion as a delay hack and it feels very wrong indeed :triangular_flag_on_post::triangular_flag_on_post::triangular_flag_on_post:

2 Likes

Don’t query calls on read data from 1 replica? How is it verified for query calls that 1 replica returned the correct data?

Correct. If one wants a higher level of security or guarantee, one should use Update calls (since they go through consensus).

A common pattern devs have been using on front-ends is:

  1. Fire off a query call on the frontend for quick answer
  2. Simultaneously fire off an update call for higher security
  3. Use frontend code to reconcile if #1 and #2 are different
3 Likes

… or use certified variables to allow the client to check the response of the query call.

8 Likes

Summary

This is a draft/preview of a motion proposal for “HTTP Requests from Canisters” which we will submit tomorrow (February 23, 2022) for the community to vote on.

Once proposal is live, I will update the forum.

Background

Canister smart contracts cannot make requests to HTTP/HTTPS services on the Internet by default. Doing so is a challenge on any blockchain caused by the fact that different replicas / nodes can (and will) receive different responses for the same call, be it, for example, for timestamps or ids contained in the response. Such different responses received by different replicas and further processing being based on those different responses on each replica would lead to a divergence of state on the different replicas and thus destroy the determinism property of the computation with the result that no consensus could be achieved. Thus, replicas cannot just make HTTP/HTTPS requests to outside services without creating a major problem for the subnet.

However, it is technically possible to enable canisters on the IC to safely make HTTP/HTTPS (henceforth just “HTTP”) requests with an extension of the Internet Computer protocol. This feature discusses such extension to the Internet Computer protocol. We think that this is a crucial step towards better integrating the Internet Computer with the public Internet and thus breaking down so-far inherent borders that blockchains face. We think that opening up the Internet Computer to interface it with HTTP services on the Internet is a major step towards the future of the Internet Computer as a platform that can run general-purpose workloads and also for a more open, integrative, blockchain world at large.

Today, obtaining data from the outside world requires, as on any blockchain, the use of blockchain oracles, or oracles. However, oracles lead to a more complicated programming model, charge substantial fees, add complexity and indirections, and require additional trust assumptions to be made. Allowing canister code to directly make HTTP requests would remove the dependence on oracles and its disadvantages. Of course, oracles may still be useful for certain use cases when we have HTTP calling capabilities, but many use cases could be covered with direct HTTP calls.

Allowing canisters to make HTTP calls to services on the public Internet has long been a community-requested feature for the Internet Computer. It will give smart contract canisters the ability to autonomously connect to services on the Internet to retrieve or submit data and will thus enable a large range of additional or enhanced use cases, e.g., obtaining exchange rate data from external servers for DeFi applications, obtaining weather data for decentralized insurance services, or sending notifications to users via traditional communications channels, all without using oracles. HTTP support for canisters is one of the features on our strategic R&D initiative on “General Integration” (see Long Term R&D: General Integration (Proposal) - #4 by dieter.sommer), thus we want to now proceed with launching a motion proposal for this feature and ask for community approval w.r.t. going forward. If accepted, a launch in the Q1 Chromium release is planned, meaning an aggressive timeline for our engineering teams.

Goals and Requirements

This feature should enable canisters to directly make requests to HTTP URLs, using the GET method initially, and receive the corresponding response back into the canister’s state in a deterministic fashion. The functionality should be realized in a direct, trustless, way. Direct, trustless, integration is a common theme in other integrations of the IC, e.g., the Bitcoin integration that is to launch on mainnet in the near future. Direct integration means that we do not need to make any additional trust assumptions or involve any additional parties to realize the functionality.

For the first version, all replicas in the subnet will send out the request and return a response that goes through the IC Consensus mechanism. In the future, we may present an option for canister developers to reduce the size of the required quorum, so that even only one replica may make the request, if desired, but the guarantees on such request would be accordingly lower. Another future envisioned extension are POST requests. In combination with the reduced quorum, those can be of tremendous utility for many use cases where reliability of the calls is less important than compatibility with APIs out there.

Proposed Design

We next outline the proposed design at a high level.

System API (Management Canister)

We implement a system API in the Management Canister that provides a method for making an HTTP/HTTPS call to an outside service and receiving back a response. See below for the original proposal w.r.t. the API and note that not all community feedback from the discussions in this forum topic have not been included here yet.

type http_header = record { 0: text; 1: text };

type http_response = record {
  status: nat;
  headers: vec http_header;
  body: blob;
};


type http_request_error = variant {
  no_consensus;
  timeout;
  bad_tls;
  invalid_url;
  transform_error;
  dns_error;
  unreachable;
  conn_timeout;
};

New method in ic0:

  http_request : (record {
    url : text;
    method : variant { get };
    headers: vec http_header;
    body : opt blob;
    transform : opt variant {
      function: func (http_response) -> (http_response) query
    };
  }) -> (variant { Ok : http_response; Err: opt http_request_error });

See the (draft) PR on the interface specification repository, which is now public, for details regarding the proposed API and related discussions: IC-530:Canister HTTP requests by ielashi ¡ Pull Request #7 ¡ dfinity/interface-spec ¡ GitHub.

High-Level Request and Response Flow

Calling this API will store the request in a specific area of replicated state that is periodically read by a component at the networking / consensus layer. This component, once it sees a new request, provides this request to an HTTP Adapter at the networking layer which performs the actual request and provides a response in return.

Responses are put into a new HTTP Artifact Pool, are signed by the replica to endorse the response, and the signature is gossiped to all replicas in the subnet. Once a request has support by at least 2/3 of the replicas of the subnet in the view of the current block making replica, it adds this endorsed response to an IC block that is going through Consensus. Because at least 2/3 of the replicas of the subnet have supported the response, it is ensured that the subnet can achieve consensus on it.

Once the IC block with the HTTP response has made it through the IC consensus layer, it is routed back to the system API and is provided back to the calling canister, which concludes the original API call for making an HTTP request.

We do not go into the details of the error handling. In short, it is possible, as error scenarios, that requests time out or cannot be consented on, in which case a corresponding error response is generated and returned in response to the request.

Handling Differences in Responses

Many HTTP-based services like API providers include fine-granular timestamps or unique ids into their responses, implying that it would not be possible to achieve consensus on the responses received by the different replicas in the subnet. This can be addressed by allowing the caller to specify a response processing function to be performed on the responses before they are provided to the consensus layer. This allows for a much broader field of application for the feature by allowing a broader class of responses to obtain consensus on.

The canister may specify a response processing method that, when a response is received by each replica, is applied on the response on each replica to transform it accordingly into a response that is intended to be the same on each replica and thereby will be accepted by the IC consensus mechanism.
The transformation may, for example, only keep specific fields from the responses, while removing other values that might differ across responses, such as timestamps or unique identifiers. The transformation can also just retain a single value of interest from the whole response, e.g., an exchange rate value, which would substantially reduce the required IC “block bandwidth”.

The design choice to expose a canister method to perform the transformation and not do the transformation directly in replica code has multiple reasons behind it:

  • The computational effort for the transformation can be directly accounted for through consuming the canister’s cycles. Thereby certain kinds of denial of service attacks that would be possible and would need to be addressed for a replica implementation are not possible.
  • It is fully flexible in terms of which transformations can be implemented. Implementations in replica code would use a specific approach, e.g., a templating language, for defining the transformation.
    A drawback of the approach of exposing a canister method for the transformation instead of an alternative considered design of allowing for a set of transformation types parameterized by a template as input to the method call is that it may be slightly more effort on the side of the canister author to implement the canister method. However, the tradeoffs have been considered substantially in favour of the approach of exposing a canister method, as in the other approach it would be difficult to charge for the transformation effort and to prevent DoS attacks using long-running transformations.

Roadmap

We plan, assuming a supportive community vote, to build the feature to be ready for a release around the end of Q1 / 2022. We propose a design and scope that is reasonable to implement for a first MVP as outlined in this motion proposal to trigger further discussions and support a community vote.
The implementation cuts through all the layers of the IC protocol stack and thus requires tight collaboration between the core IC engineering teams. Most engineering effort is expected on the consensus layer, next are networking, execution, and message routing in descending order of effort. In order to meet the tight timeline to a Q1 release, the different teams will work in parallel as far as reasonably possible.

In order to ensure the high quality of the feature, we will perform extensive automated testing in our system testing environment and a security review.

Extensions

The envisioned feature implementation is a first MVP that provides the core functionality of allowing canisters to make HTTP requests. We have already identified some enhancements that have been decided to not be implemented as part of the first release, but that we can realize as separate features in the future.

  • POST/PUT requests: Those would be pretty similar in terms of implementation assuming idempotency of the requests. Not assuming this inherently requires us to use a reduced quorum of size 1 to emulate traditional POST/PUT calls or to extend the called API such that it can handle multiple requests for the same POST/PUT to be done and execute them only once.
  • Customizable quorum (unsafe requests): This allows the canister to specify the quorum size to make a tradeoff between performance, resource consumption, and compatibility with traditional HTTP-based servers on the one hand and security on the other hand. The most relevant reduced quorum size in practice will be quorum size 1. This extension, together with the possibility of making POST/PUT requests will enable another large array of use cases without making changes to external services.
  • Persistent connections: This is an extension purely for better performance, and thus left as an extension instead of implementing it already as part of the MVP.
  • Different numerical response values: Some APIs will result in slightly different response values if called at slightly different times. The latter is typically the case in the setting of all replicas making the same call to a service. A further extension to the feature can allow for such different received numerical response values to be consented on and being returned in appropriate form, e.g., their median or all values are returned so that the calling canister can directly receive or apply an appropriate function to determine the “actual” response value.
11 Likes

At first blush, this is a clean solution. I do have a question, though.

What would be the scope/namespace and capabilities of the pre-consensus hook function? That function is SUPER cool, but seems like it could be abused.

What kind of ways of abuse of the function are you envisioning? It’s a function that can only be executed as a query.

Update:
On a first glance, I do not see any major means of abusing this. It essentially behaves like a pure function, so no state of the canister can be changed.
It should be similar in terms of abuse potential to any query call that a canister offers.

Proposal is live: Internet Computer Network Status

1 Like

I foresee the need to make http requests from query calls.

1 Like