2.6.37 to 3.8.8 Linux Kernel Exploit (PERF_EVENTS) by spender (abacus)

No replies
RaT
RaT's picture
Offline
SX High Council
Joined: 2008/03/12

Just in case you've been living under a rock, please grab it at http://grsecurity.net/~spender/exploits/enlightenment.tgz (abacus is the new exploit).

Here is a bit of a write up that appeared on reddit:
Vuln involves the event_id variable in kernel/events/core.c:perf_swevent_init(), which is a signed integer with only its upper bound checked. On close of the event, it will do:

static void sw_perf_event_destroy(struct perf_event *event)
{
u64 event_id = event->attr.config;

WARN_ON(event->parent);

static_key_slow_dec(&perf_swevent_enabled[event_id]);
swevent_hlist_put(event);
}

The event_id (provided by the exploit) is used as a u64 here, creating an effective index > 2G and < 4G. Userland however could provide anything in the upper dword of event_id as long as the lower dword (the truncated part cast to an integer) is >= 0x80000000U.
The exploit provides a negative event_id, which when converted into an array index, combined with the location of the kernel in a 64bit address space, results in an address in userland within a reasonable range. The exploit prepares for this by mapping that range and filling it with known contents. The decrement performed by the kernel will then modify some data in the mapping, which the exploit will be able to find. An increment is also performed via the code below, but the indexes of -1 and -2 were likely chosen so as to not have this cause modification of anything important on the kernels tested.
By knowing the address of the modified data, the event_id provided to the kernel, and the size of the elements of the array being indexed, the exploit can compute the address of the base of the array. Then it acquires the IDT base and targets the overflow interrupt vector. Here it is targeting it with different code:

static_key_slow_inc(&perf_swevent_enabled[event_id]);

from perf_swevent_init(), which will allow it to actually use its negative index. It then targets the interrupt vector entry with the increment from the vulnerable code. It modifies the handler by incrementing the most significant 0xffffffff, turning it into 0x00000000 and causing the whole thing to be a userland address. The exploit masks off everything but the top byte of the lower dword of the IDT base it acquired so that it can map the memory that will be executed when an address in the kernel image is changed to a userland address. So a kernel address of 0xffffffff8dd60000 becomes a userland address of 0x000000008dd60000, and he'll have 0x000000008d000000 -> 0x000000008e000000 mapped or so. It fills this up with shellcode to run upon its execution of 'int 0x4'. The shellcode finds and modifies the credentials for the current task, replacing uids/gids with 0 and granting full capabilities.
This is not an exact interpretation of the source, which has bugs, but how it should work in general. For instance, changing 0x300 in the source to 0x320 will set the "ignore_kernel" flag on the event and allow the exploit to work when perf_event_paranoid is set to 2.
The address inference via userland would be prevented by UDEREF, the IDT modification would be prevented by KERNEXEC, and SMEP would prevent the execution of shellcode in userland.
Editing to summarize more information that I posted on twitter:
Ubuntu is exploitable, just the exploitation is different. The size of the array elements is dependent on the size of the static_key struct. On Ubuntu, CONFIG_JUMP_LABEL causes this struct to be 24 bytes in size on x86_64 compared to the 4 bytes assumed in sd's exploit. I'm still able to modify the IDT, though not the overflow vector as done in sd's exploit (which brings with it additional "challenges"). CONFIG_JUMP_LABEL also introduces a mutex around the atomic_inc, though interestingly only when incrementing zero, otherwise it falls under:

if (atomic_inc_not_zero(&key->enabled))
return;

in kernel/jump_label.c:static_key_slow_inc().
You can avoid the decrement if you keep open the fd returned by perf_event_open. The decrement happens upon release/freeing of the event. You can also choose to defer the decrement and use it to your advantage. If 0xffffffff is provided as the upper dword of the event_id from userland, then the decrement side will use the same address as the address computed with the sign-extended int index, undoing the increment that occurred. This is equivalent to what will happen on x86, where sizeof(int) == sizeof(void *). So you can perform your increments, holding open the file descriptors, abuse the new value, then close all the file descriptors, having the kernel perform the cleanup work for you Wink
On x86, ARM, and PV Xen you may want to target a NULL function pointer, increment it up above mmap_min_addr while keeping all fds open (otherwise on x86 you'll be performing a no-op effectively), and then have the kernel call it. The number of open file descriptors allowed per process has a low limit, so you'll want to fork off several children. If you can time it right, incrementing security_ops could be interesting Wink (think about it for a minute or two)
-Brad