A proxy "joining" multiple HTTP requests into one

qwertytrewq · May 9, 2024, 3:17am

I am applying for a $5k Developer Grant.

Please review my application and discuss! See DFINITY’s directions for becoming a registered reviewer here. They will be collected by DFINITY. When one week passes, DFINITY will release them and they will appear as a new section on this post.

Please review my application and discuss! If you would like to submit an official review, then please use the links below to register, see the rubric, and submit a review.

I’m looking forward to everyone’s input!

Reviewer Registration | Rubric for Evaluating | Submit a Review

MY APPLICATION:

REVIEWS:

zensh · May 11, 2024, 3:26am

Hi, qwertytrewq, a month ago I developed the Idempotent Proxy service using Rust, which can solve the problem you described. I did this to enable canister calls to Google’s reCaptcha v3 to prevent bots.

Here is the canister source code: feat: Integrating Google's reCaptcha v3 into ic_panda_luckypool · ldclabs/ic-panda@add51a3 · GitHub

I also tried applying for a $5k Developer Grant yesterday and noticed that you are applying too. What a coincidence!
I suspect that developers might have developed a similar service even earlier.

sea-snake · May 11, 2024, 5:39pm

I’m wondering if you can also make a idempotency proxy with an nginx instance only, avoiding the need to build, maintain and run any custom service. Maybe a combination of cache and proxy pass config will suffice.

A quick gpt chat ended up with the following config:

http {
    proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;

    server {
        listen 80;

        location /proxy {
            resolver <your_dns_resolver>;  # Replace with your DNS resolver IP
            set $backend_url $arg_url;
            proxy_pass $backend_url;
            proxy_cache my_cache;
            proxy_cache_valid any 10m;  # Set cache validity time
            proxy_cache_key "$host$request_uri";
            proxy_cache_lock on;
            proxy_cache_lock_timeout 5s;
        }
    }
}

Obviously I would still add some header access check to avoid making it a publicity available nginx proxy.

qwertytrewq · May 11, 2024, 10:45pm

Accordingly Claude.ai, your proxy Nginx code will not work as needed:

Q:

Will the following Nginx config deliver outdated data even when the backend server directs not to by Cache-Control: header?

A:

No, the Nginx reverse proxy configuration you’ve provided will respect the Cache-Control headers set by the backend server.

So, a custom code (intentionally violating HTTP standard) is needed.

Also, with custom code it’s possible to use vet-keys, to save proxy traffic from hackers with an IC instance.

qwertytrewq · May 11, 2024, 10:48pm

My software also may be easier to install than Nginx, when Apache is already installed.

qwertytrewq · May 11, 2024, 10:49pm

Also, I am not sure whether Nginx can be installed on AWS Lambda (I think, no).

sea-snake · May 11, 2024, 10:55pm

I mean apache and nginx both do the same thing. And probably can both be configured for the same end result.

Regarding vet keys, making http outcalls to a proxy that support vet keys encrypted requests is a very interesting use case you brought up. It basically adds another interesting use-case (non public outcalls) besides http outcalls to non idempotent API endpoints.

qwertytrewq · May 11, 2024, 11:04pm

My code has some advantages compared to yours:

easier to install
(will) support AWS Lambda
does not require Redis
easily configurable caching time
protection with a proxy secret
option to hard-specify upstream host
passing headers to upstream

Your Idempotent-Key idea is good, but better to follow HTTP standard and name it X-Idempotent-Key.

I see that Redis is useful to ease load balancing.

@zensh What’s about an agreement to halve our $5k grant to two $2500 paid to both of us, whoever wins?

sea-snake · May 11, 2024, 11:17pm

Maybe split the work across different parts, that could depending on each individual depth be seen as separate grants?

I would love to see the vetKeys part instead of relying on unsafe things like secret keys in the header:

verify the incoming call in the proxy is coming from a valid canister based on e.g. canister signature
decrypt incoming request with vetkeys
call external api and encrypt the response with vetkeys
canister receives and decrypts response with vetkeys

As for caching relying on Redis, I would make things like caching configurable so it can rely on different things to suit the environment the proxy is running in.

Creating a proxy is a lot more straightforward, but creating a proxy specific for the IC that uses e.g. canister signatures and vetkeys seems like a cool challenge.

qwertytrewq · May 11, 2024, 11:30pm

After some more thinking: Idempotent-Key is a superfluous idea for our use case (outcalls on IC).

Really, idempotency can be determined by hash of the request (headers and body). The hash is always the same for the same outcall. Reversely, if the hash is the same, the returned information by the logic of the outcalls should be the same. So, hash of the outcall body and headers can play the role of Idempotency-Key, rendering Idempotency-Key useless.

qwertytrewq · May 11, 2024, 11:34pm

I don’t understand this English phrase. Could you rephrase? What is “individual depth”? (The depth of an individual’s tech expertise?)

You propose to encrypt/decrypt requests. That is superfluous, because they are anyway already encrypted with HTTPS and the canister that does an outcall already has access to both outgoing and incoming request data.

sea-snake · May 11, 2024, 11:53pm

Without vetkeys any canister call or http outcall is public information on the IC. When an http outcall is made, the http request is sent with a system api call to multiple IC nodes (unencrypted publicly) which then in turn make the actual https outcall, afterwards the responses are compared and returned to the canister (unencrypted publicly).

This is why storing, using and sending secrets with http outcalls is insecure. Even if the http request and response don’t need to be a secret, you’d want the manner of verifying if the incoming request is from a trusted canister to be secure, not a token that becomes public information on the IC the moment you sent it in the request.

So two additional challenges I see besides proxying requests with idempotency:

Verify that a incoming request is coming from a trusted canister with tECDSA, canister signatures or maybe even vetkeys in case request is encrypted with vetkeys. (discussed earlier here)
Encrypt/decrypt proxied requests and responses between canister and proxy with vetkeys.

zensh · May 12, 2024, 1:22am

I agree

In some scenarios, the hash of the request header and request body alone cannot determine if it should be idempotent. For example, querying token price information is also time-dependent. I am implementing a general-purpose idempotent reverse proxy service, where users can define their idempotence logic using an Idempotent-Key. Of course, when a request does not include an Idempotent-Key, the hash of the request (url, headers, body) can be considered.

zensh · May 12, 2024, 1:41am

Absolute confidentiality is an issue, and I have been waiting for ICP’s vetKeys to be deployed in the production environment.

The Idempotent Proxy service resolves issues by validating access tokens based on ECDSA. The canister issues tokens using a private key, and the Idempotent Proxy uses a public key to verify the tokens, ensuring requests come from trusted sources.

The Idempotent Proxy service also supports replacing request URLs (already implemented) and headers. It stores sensitive information like secret keys within the Idempotent Proxy service, rather than in the canister. When the canister makes a request using variable names, the Idempotent Proxy replaces them with the actual secrets before forwarding the request to the target service.

For example, if a canister’s request URL is https://grecaptcha.panda.fans/URL_GRE, the reverse proxy service will replace it with a full URL containing the API key: “https://recaptchaenterprise.googleapis.com/v1/projects/xxx/assessments?key=xxxxxx.”

qwertytrewq · May 12, 2024, 2:18am

Wait, we didn’t set for each other some kind of warranty that it will really be paid. So, don’t count me agreed on this (yet).

But if we pass Idempotent-Key, it changes the hash of the message and therefore retrieves its new version anyway. There is no need to support Idempotent-Key specifically (except that we can introduce configuration to remove some headers, in particular to remove Idempotent-Key not to be send to the upstream host).

However, your Idempotent-Key has advantages when using outside of IC outcalls, but for some other purpose: We can send the same reply for different requests, if they have the same Idempotent-Key.

qwertytrewq · May 12, 2024, 2:26am

Apparently, I should also rewrite in Rust (I know Rust well.), to support vetKeys. BTW, I to make usage of Redis (or just a data structure in Rust) optional, for user convenience (also direct querying of a Rust data structure is expected to be a little more performant than using Redis).

zensh · May 12, 2024, 2:28am

Ok…
As long as the Idempotent Proxy service can help alleviate some pain points for ICP developers, that’s sufficient. I am also happy to continue improving it.

qwertytrewq · May 12, 2024, 4:27am

The below is a request for comments. I checked it for error “lightly” and am going to check more. Your input is also welcome.

So, I studied vetKeys. My proposal for vetKeys (I replaced vetKeys by plain tECDSA, by update callback to the canister) in our proxy:

First, rewrite my code in Rust to easily access the needed cryptography. (I know Rust and some Rust Web frameworks well and can easily translate Python code to Rust.) I propose to use my Python code, not @zensh’s code, as the base, because I have better idea of using a hash rather than a specific header with randomness, that complicates things a little on the side of IC canister.

Second, make usage of vetKeys optional (using Bearer token as an alternative), because apparently vetKeys will eat much cycles.

For the proxy’s config there will be one or more principals that outcalls from are allowed.

We will consider two canisters: The “keeper” canister (that does the cryptography and to which the proxy trusts) and the calling canister that does an actual HTTPS outcalls.

The workflow:

The keeper canister generates two nonces: long-time-nonce and short-time-nonce.
The calling canister sends the HTTP outcall with headers Canister: <keeper canister principal> and Canister-Key: signature (with sign_with_ecdsa) of (keeper principal, long-time-nonce, short-time-nonce)> and with header Nonce: set by the calling canister to contain the two nonces.
The proxy returns access denied if short-time-nonce is repeated compared to a previous call.
The proxy returns access denied if keeper canister isn’t in the list of allowed callers.
The proxy verifies the signed principal of calling canister by using ecdsa_public_key.
The proxy returns access denied if the signature does not match.
The proxy returns a signature of the nonce by a private key stored in the proxy back with the proxied content in Signature: header.
We trap in the transformation function, if Signature header does not match nonce of keeper principal (using public key openly stored in the user’s code).

The above scheme is still insecure in that a hacked hardware running canisters (ours and possibly another one) can receive and steal the SK and use it to send either a modified request from our canister (with fake Canister-Key:) or even from an another arbitrary canister (with fake Canister-Key: and possibly fake Canister:), that will be considered by our proxy as a genuine request. This way, for example, an amount of OpenAI tokens can be stolen.

But if Nonce: is a unique value, then misbehavior of the canister hardware will be revealed by a trap in the transformation function by its inability to serve replies with correct nonces (while a majority other nodes served it correctly), and it can be knocked out of the system. Short-time-nonce is used to prevent the hardware to make the correct request soon after a fake one (by a stolen SK). Long-time nonce is used to prevent the hardware to store nonce data for a long time and fetch stored data to hack the system by repeating the short-time nonce.

Note that the proxy does not need to store the full history of nonces, but just for about a few minutes, because outcalls needs to be answered quickly.

P.S. Will a canister be knocked out from the system, if it repeatedly returns results of HTTPS outcalls different than the consensus? (This is at one side desirable, because it knocks out hacked canisters; and at other side, not desirable, because a site owner of the outcall can intentionally harm a canister with the given IP, giving it incorrect results.) This security vulnerability (if it exists), can be mostly solved by DFINITY allocating a big enough pool of IPv6 addresses and allowing canisters to use random addresses from it. Having this, a particular canister could still be targeted by its response times “profile”, but that becomes difficult for an attacker.

qwertytrewq · May 12, 2024, 5:06am

This seems not being enough, because the secret key would be stolen (after it is retrieved) by hacked canister hardware and then used to do (possibly, repeatedly) fake requests, like stealing OpenAI tokens. Even if (that I doubt) your way does prevent repeated access with the SK token, the owner of a hacked canister hardware could still invisibly steal a big amount of (proxied) OpenAI tokens by doing two (or more) requests to OpenAI instead of one, when it is prompted to do a request.

One reason why I think, that your approach to this is not enough is because I developed a complex way in a competing grant application to do this, and what you explained seems to be too simple compared to (as I assume) the right way.

sea-snake · May 12, 2024, 12:52pm

I wouldn’t use vetkeys specifically in the proxy to verify if a incoming request is coming from a trusted canister. Instead vetkeys are mostly useful to encrypt/decrypt the request and response to hide sensitive information from the IC nodes.

For verifying if a proxy request comes from a trusted canister, a tECDSA or canister signature is probably a better approach.

Topic		Replies	Views
Idempotent Proxy: proxy HTTPS Outcalls to any Web2 service Developer Grant Proposals	29	1276	September 22, 2024
Non replicated HTTPS outcalls Developers	20	1433	August 30, 2024
Enable canisters to make HTTP(S) requests Roadmap	217	22850	June 29, 2023
Idempotent Proxy Show: proxy HTTPS Outcalls to any Web2 service Showcase	9	450	July 15, 2025
Feature request: map appropriate HTTP request methods to update calls Developers	56	6075	April 22, 2022

A proxy "joining" multiple HTTP requests into one

Related topics