System Overlord

A blog about security engineering, research, and general hacking.

Security 101: X-Forwarded-For vs. Forwarded vs PROXY

Over time, there have been a number of approaches to indicating the original client and the route that a request took when forwarded across multiple proxy servers. For HTTP(S), the three most common approaches you’re likely to encounter are the X-Forwarded-For and Forwarded HTTP headers, and the PROXY protocol. They’re all a little bit different, but also the same in many ways.

X-Forwarded-For

X-Forwarded-For is the oldest of the 3 solutions, and was probably introduced by the Squid caching proxy server. As the X- prefix implies, it’s not an official standard (i.e., an IETF RFC). The header is an HTTP multi-valued header, which means that it can have one or more values, each separated by a comma. Each proxy server should append the IP address of the host from which it received the request. The resulting header looks something like:

1
X-Forwarded-For: client, proxy1, proxy2

This would be a request that has passed through 3 proxy servers – the IP of the 3rd proxy (the one closest to the application server) would be the IP seen by the application server itself. (Often referred to as the “remote address” or REMOTE_ADDR in many application programming contexts.)

So, you could end up seeing something like this:

1
X-Forwarded-For: 2001:DB8::6, 192.0.2.1

Coming from a TCP connection from 127.0.0.1. This implies that the client had IPv6 address 2001:DB8::6 when connecting to the first proxy, then that proxy used IPv4 to connect from 192.0.2.1 to the final proxy, which was running on localhost. A proxy running on localhost might be nginx splitting between static and application traffic, or a proxy performing TLS termination.

Forwarded

The HTTP Forwarded header was standardized in RFC 7239 in 2014 as a way to better express the X-Forwarded-For header and related X-Forwarded-Proto and X-Forwarded-Port headers. Like X-Forwarded-For, this is a multi-valued header, so it consists of one or more comma-separated values. Each value is, itself, a set of key-value pairs, with pairs separated by semicolons (;) and the keys and values separated by equals signs (=). If the values contain any special characters, the value must be quoted.

The general syntax might look like this:

1
Forwarded: for=client, for=proxy1, for=proxy2

The key-value pairs are necessary to allow expressing not only the client, but the protocol used, the original HTTP Host header, and the interface on the proxy where the request came in. For figuring out the routing of our request (and for parity with the X-Forwarded-For header), we’re mostly interested in the field named for. While you might think it’s possible to just extract this key-value pair and look at the values, the authors of the RFC added some extra complexity here.

The RFC contains provisions for “obfuscated identifiers.” It seems this is mostly intended to prevent revealing information about internal networks when using forward proxies (e.g., to public servers), but you might even see it in operating reverse proxies. According to the RFC, these should be prefixed by an underscore (_), but I can imagine cases where this would not be respected, so you’d need to be prepared for that in parsing the identifiers.

The RFC also contains provisions for unknown upstreams, identified as unknown. This is used to indicate forwarding by a proxy in some manner that prevented identifying the upstream source (maybe it was through a TCP load balancer first).

Finally, there’s also the fact that, unlike the defacto standard of X-Forwarded-For, Forwarded allows for the option of including the port number on which it was received. Because of this, IPv6 addresses are enclosed in square brackets ([]) and quoted.

The example from the X-Forwarded-For section above written using the Forwarded header might look like:

1
Forwarded: for="[2001:DB8::6]:1337", for=192.0.2.1;proto=https

Additional examples taken from the RFC:

1
2
3
4
Forwarded: for="_gazonk"
Forwarded: For="[2001:db8:cafe::17]:4711"
Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43
Forwarded: for=192.0.2.43, for=198.51.100.17

PROXY Protocol

At this point, you may have noticed that both of these headers are HTTP headers, and so can only be modified by L7/HTTP proxies or load balancers. If you use a pure TCP load balancer, you’ll lose the information about the source of the traffic coming in to you. This is particularly a problem when forwarding HTTPS connections where you don’t want to offload your TLS termination (perhaps traffic is going via an untrusted 3rd party) but you still want information about the client.

To that end, the developers of HAProxy developed the PROXY protocol. There are (currently) two versions of this protocol, but I’ll focus on the simpler (and more widely deployed) 1st version. The proxy should add a line at the very beginning of the TCP connection in the following format:

1
PROXY <protocol> <srcip> <dstip> <srcport> <dstport>\r\n

Note that, unlike the HTTP headers, this makes the PROXY protocol not backwards compatible. Sending this header to a server not expecting it will cause things to break. Consequently, this header will be used exclusively by reverse proxies.

Additionally, there’s no support for information about the hops along the way – each proxy is expected to maintain the same PROXY header along the way.

Version 2 of the PROXY protocol is a binary format with support for much more information beyond the version 1 header. I won’t go into details of the format (check the spec if you want) but the core security considerations are much the same.

Security Considerations

If you need (or want) to make use of these headers, there are some key security considerations in how you use them to use them safely. This is particularly of consideration if you use them for any sort of IP whitelisting or access control decisions.

Key to the problem is recognizing that the headers represent untrusted input to your application or system. Any of them could be forged by a client connecting, so you need to consider that.

Parsing Headers

After I spent so long telling you about the format of the headers, here’s where I tell you to disregard it all. Okay, really, you just need to be prepared to receive poorly-formatted headers. Some variation is allowed by the specifications/implementations: optional spaces, varying capitalization, etc. Some of this will be benign but still unexpected: multiple commas, multiple spaces, etc. Some of it will be erroneous: broken quoting, invalid tokens, hostnames instead of IPs, ports where they’re not expected, and so on.

None of this, however, precludes malicious input in the case of these headers. They may contain attempts at SQL Injection, Cross-Site Scripting and other malicious content, so one needs to be cautious in parsing and using the input from these headers.

Running a Proxy

As a proxy, you should consider whether you expect to be receiving these headers in your requests. You will only want that if you are expecting requests to be forwarded from another proxy, and then you should make sure the particular request came from your proxy by validating the source IP of the connection. As untrusted input, you cannot trust any headers from proxies not under your control.

If you are not expecting these headers, you should drop the headers from the request before passing it on. Blindly proxying them might cause downstream applications to trust their values when they come from your proxy, leading to false assertions about the source of the request.

Generally, you should rewrite the appropriate headers at your proxy, including adding the information on the source of the request to your proxy, before passing them on to the next stage. Most HTTP proxies have easy ways to manage this, so you don’t usually need to format the header yourself.

Running an Application

This is where it gets particularly tricky. If you’re using IP addresses for anything of significance (which you probably shouldn’t be, but it’s likely there’s some cases where people still are), you need to figure out whether you can trust these headers from incoming requests.

First off, if you’re not running the proxies: just don’t trust them. (Of course, I count a managed provider as run by you.) Also, if you’re not running the proxy, I hope we’re only talking about the PROXY protocol and you’re not exposing plaintext to untrusted 3rd parties.

If you are running proxies, you need to make sure the request actually came from one of your proxies by checking the IP of the direct TCP connection. This is the “remote address” in most web programming frameworks. If it’s not from your proxy, then you can’t trust the headers.

If it’s your proxy and you made sure not to trust incoming headers in your proxy (see above), then you can trust the full header. Otherwise, you can only trust the incoming hop to your proxy and anything before that is not trustworthy.

Man in the Middle Attacks

All of this disregards MITM attacks of course. If an attacker can inject traffic and spoof source IP addresses into your traffic, all bets on trusting headers are off. TLS will still help with header integrity, but they can still spoof the source address, convincing you to trust the headers in the request.

Bug Bounty Tip

Try inserting a few headers to see if you get different responses. Even if you don’t get a full authorization out of it, some applications will give you debug headers or other interesting behavior. Consider some of the following:

1
2
3
X-Forwarded-For: 127.0.0.1
X-Forwarded-For: 10.0.0.1
Forwarded: for="_localhost"

Security 101: Virtual Private Networks (VPNs)

I’m trying something new – a “Security 101” series. I hope to make these topics readable for those with no security background. I’m going to pick topics that are either related to my other posts (such as foundational knowledge) or just things that I think are relevant or misunderstood.

Today, I want to cover Virtual Private Networks, commonly known as VPNs. First I want to talk about what they are and how they work, then about commercial VPN providers, and finally about common misconceptions.


BSides SF 2020 CTF: Infrastructure Engineering and Lessons Learned

Last weekend, I had the pleasure of running the BSides San Francisco CTF along with friends and co-conspirators c0rg1, symmetric and iagox86. This is something like the 4th or 5th year in a row that I’ve been involved in this, and every year, we try to do a better job than the year before, but we also try to do new things and push the boundaries. I’m going to review some of the infrastructure we used, challenges we faced, and lessons we learned for next year.


Hacker Holiday Gift Guide (HHGG) 2019

I wanted to put together a few thoughts I had on gifts for my fellow hackers this holiday season. I’m including a variety of different things to appeal to almost anyone involved in information security or hardware hacking, but I’m obviously a bit biased to my own areas of interest. I’ve tried to roughly categorize things, but they tend to transcend boundaries somewhat. Got a suggestion I missed? Hit me up on Twitter.


Backing up to Google Cloud Storage with Duplicity and Service Accounts

I wanted to use duplicity to backup to Google Cloud Storage. I looked into it briefly and found that the boto library, originally for AWS, also supports GCS, but only using authorization tokens. I’d rather use a service account, for which authorization tokens are not available.

I looked into the options and the best information I could find was a Medium post, but it also describes using authorization tokens and creating a separate GMail/Google Apps account for the access. I’d really prefer to go with a service account to avoid having to sign up another account, and to be able to use more granular ACLs for the service account.

It turns out there’s a boto plugin for GCS with OAuth2 support, but enabling a boto plugin in duplicity isn’t straight-forward. You can point it to a “plugin directory” that causes duplicity to import any python files in the directory, but this doesn’t work if you point it directly to the gcs_oauth2_boto_plugin directory.

Install Requirements

Install the following:

Create your GCS Bucket

Create a GCS bucket. In my case, I set the default storage class to “nearline” because I expect backups to be infrequently accessed (I hope), and I plan to retain the data for the minimum 30 day retention. It’s also cheaper than standard storage, so a great combination for backups.

GCS Bucket Setup

Create your service account

Next up, you need to create a service account and grant it the appropriate permissions on the bucket. Go through IAM > Service Account and create a new service account. You don’t need to grant it any roles at this time, but at the end, you should select to “Create key” and download a JSON-formatted service account key.

Go back to the bucket you created, and go to the Permissions tab. Add the service account you just created as a “Storage Object Creator” and a “Storage Object Viewer”.

Create the Boto Configuration

For this, you’ll need the Google Cloud SDK tool gsutil. Run gsutil -e -o <path to your new config>, and provide the JSON file when prompted. Note that the JSON file is only referenced by the config, so if you move it somewhere else, you’ll need to update the configuration. (Or move it first, then run it.)

This will create the necessary configuration for boto to authenticate to GCS. You’ll still need to add the support for OAuth2 authentication, so first create an empty directory to serve as your plugin directory. In my case, I created a directory ~/.config/boto/plugins for all my plugins. In it, I created one file called gcs.py whose only contents is the following:

1
import gcs_oauth2_boto_plugin

I then added the following to the bottom of my boto configuration file:

1
2
[Plugin]
plugin_directory = /home/matir/.config/boto/plugins

This will result in boto loading the gcs_oauth2_boto_plugin python module for OAuth2 authentication on GCS when being loaded into duplicity.

Setup the Duplicity Command

At this point, it’s almost like running any duplicity backup. If you chose to place your boto configuration in a non-standard location, just set the environment variable BOTO_CONFIG to point to the configuration file. I run the following:

1
2
3
4
5
6
7
export BOTO_CONFIG=${HOME}/.config/boto/boto_backups

duplicity \
  incremental \
  --full-if-older-than 30D \
  ${HOME} \
  gs://demo-backup-bucket