Over time, there have been a number of approaches to indicating the original
client and the route that a request took when forwarded across multiple proxy
servers. For HTTP(S), the three most common approaches you’re likely to
encounter are the X-Forwarded-For
and Forwarded
HTTP headers, and the PROXY
protocol
. They’re all a little bit different, but also the same in many ways.
X-Forwarded-For
X-Forwarded-For
is the oldest of the 3 solutions, and was probably introduced
by the Squid caching proxy server. As the X-
prefix implies, it’s not an
official standard (i.e., an IETF RFC). The header is an HTTP multi-valued
header, which means that it can have one or more values, each separated by a
comma. Each proxy server should append the IP address of the host from which it
received the request. The resulting header looks something like:
1
X-Forwarded-For: client, proxy1, proxy2
This would be a request that has passed through 3 proxy servers – the IP of
the 3rd proxy (the one closest to the application server) would be the IP seen
by the application server itself. (Often referred to as the “remote address” or
REMOTE_ADDR
in many application programming contexts.)
So, you could end up seeing something like this:
1
X-Forwarded-For: 2001:DB8::6, 192.0.2.1
Coming from a TCP connection from 127.0.0.1
. This implies that the client had
IPv6 address 2001:DB8::6
when connecting to the first proxy, then that proxy
used IPv4 to connect from 192.0.2.1 to the final proxy, which was running on
localhost. A proxy running on localhost might be nginx splitting between static
and application traffic, or a proxy performing TLS termination.
Forwarded
The HTTP Forwarded
header was standardized in RFC
7239 in 2014 as a way to better express
the X-Forwarded-For
header and related X-Forwarded-Proto
and
X-Forwarded-Port
headers. Like X-Forwarded-For
, this is a multi-valued
header, so it consists of one or more comma-separated values. Each value is,
itself, a set of key-value pairs, with pairs separated by semicolons (;
) and
the keys and values separated by equals signs (=
). If the values contain any
special characters, the value must be quoted.
The general syntax might look like this:
1
Forwarded: for=client, for=proxy1, for=proxy2
The key-value pairs are necessary to allow expressing not only the client, but
the protocol used, the original HTTP Host
header, and the interface on the
proxy where the request came in. For figuring out the routing of our request
(and for parity with the X-Forwarded-For
header), we’re mostly interested in
the field named for
. While you might think it’s possible to just extract this
key-value pair and look at the values, the authors of the RFC added some extra
complexity here.
The RFC contains provisions for “obfuscated identifiers.” It seems this is
mostly intended to prevent revealing information about internal networks when
using forward proxies (e.g., to public servers), but you might even see it in
operating reverse proxies. According to the RFC, these should be prefixed by an
underscore (_
), but I can imagine cases where this would not be respected, so
you’d need to be prepared for that in parsing the identifiers.
The RFC also contains provisions for unknown upstreams, identified as
unknown
. This is used to indicate forwarding by a proxy in some manner that
prevented identifying the upstream source (maybe it was through a TCP load
balancer first).
Finally, there’s also the fact that, unlike the defacto standard of
X-Forwarded-For
, Forwarded
allows for the option of including the port
number on which it was received. Because of this, IPv6 addresses are enclosed
in square brackets ([]
) and quoted.
The example from the X-Forwarded-For
section above written using the
Forwarded
header might look like:
1
Forwarded: for="[2001:DB8::6]:1337", for=192.0.2.1;proto=https
Additional examples taken from the RFC:
1
2
3
4
Forwarded: for="_gazonk"
Forwarded: For="[2001:db8:cafe::17]:4711"
Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43
Forwarded: for=192.0.2.43, for=198.51.100.17
PROXY Protocol
At this point, you may have noticed that both of these headers are HTTP headers, and so can only be modified by L7/HTTP proxies or load balancers. If you use a pure TCP load balancer, you’ll lose the information about the source of the traffic coming in to you. This is particularly a problem when forwarding HTTPS connections where you don’t want to offload your TLS termination (perhaps traffic is going via an untrusted 3rd party) but you still want information about the client.
To that end, the developers of HAProxy developed the PROXY protocol. There are (currently) two versions of this protocol, but I’ll focus on the simpler (and more widely deployed) 1st version. The proxy should add a line at the very beginning of the TCP connection in the following format:
1
PROXY <protocol> <srcip> <dstip> <srcport> <dstport>\r\n
Note that, unlike the HTTP headers, this makes the PROXY protocol not backwards compatible. Sending this header to a server not expecting it will cause things to break. Consequently, this header will be used exclusively by reverse proxies.
Additionally, there’s no support for information about the hops along the way – each proxy is expected to maintain the same PROXY header along the way.
Version 2 of the PROXY protocol is a binary format with support for much more information beyond the version 1 header. I won’t go into details of the format (check the spec if you want) but the core security considerations are much the same.
Security Considerations
If you need (or want) to make use of these headers, there are some key security considerations in how you use them to use them safely. This is particularly of consideration if you use them for any sort of IP whitelisting or access control decisions.
Key to the problem is recognizing that the headers represent untrusted input to your application or system. Any of them could be forged by a client connecting, so you need to consider that.
Parsing Headers
After I spent so long telling you about the format of the headers, here’s where I tell you to disregard it all. Okay, really, you just need to be prepared to receive poorly-formatted headers. Some variation is allowed by the specifications/implementations: optional spaces, varying capitalization, etc. Some of this will be benign but still unexpected: multiple commas, multiple spaces, etc. Some of it will be erroneous: broken quoting, invalid tokens, hostnames instead of IPs, ports where they’re not expected, and so on.
None of this, however, precludes malicious input in the case of these headers. They may contain attempts at SQL Injection, Cross-Site Scripting and other malicious content, so one needs to be cautious in parsing and using the input from these headers.
Running a Proxy
As a proxy, you should consider whether you expect to be receiving these headers in your requests. You will only want that if you are expecting requests to be forwarded from another proxy, and then you should make sure the particular request came from your proxy by validating the source IP of the connection. As untrusted input, you cannot trust any headers from proxies not under your control.
If you are not expecting these headers, you should drop the headers from the request before passing it on. Blindly proxying them might cause downstream applications to trust their values when they come from your proxy, leading to false assertions about the source of the request.
Generally, you should rewrite the appropriate headers at your proxy, including adding the information on the source of the request to your proxy, before passing them on to the next stage. Most HTTP proxies have easy ways to manage this, so you don’t usually need to format the header yourself.
Running an Application
This is where it gets particularly tricky. If you’re using IP addresses for anything of significance (which you probably shouldn’t be, but it’s likely there’s some cases where people still are), you need to figure out whether you can trust these headers from incoming requests.
First off, if you’re not running the proxies: just don’t trust them. (Of course, I count a managed provider as run by you.) Also, if you’re not running the proxy, I hope we’re only talking about the PROXY protocol and you’re not exposing plaintext to untrusted 3rd parties.
If you are running proxies, you need to make sure the request actually came from one of your proxies by checking the IP of the direct TCP connection. This is the “remote address” in most web programming frameworks. If it’s not from your proxy, then you can’t trust the headers.
If it’s your proxy and you made sure not to trust incoming headers in your proxy (see above), then you can trust the full header. Otherwise, you can only trust the incoming hop to your proxy and anything before that is not trustworthy.
Man in the Middle Attacks
All of this disregards MITM attacks of course. If an attacker can inject traffic and spoof source IP addresses into your traffic, all bets on trusting headers are off. TLS will still help with header integrity, but they can still spoof the source address, convincing you to trust the headers in the request.
Bug Bounty Tip
Try inserting a few headers to see if you get different responses. Even if you don’t get a full authorization out of it, some applications will give you debug headers or other interesting behavior. Consider some of the following:
1
2
3
X-Forwarded-For: 127.0.0.1
X-Forwarded-For: 10.0.0.1
Forwarded: for="_localhost"