[NIX Programming] Part 1: Knowing the Roots.

[History and Overview]
It's important to keep an open mind when it comes to going back to the roots because understanding them is something that can help unlock a barrier long forgotten that is important in the way things work. Now everything that's fundamental about computer came from a system once known as UNIX, UNIX was an incredibly powerful operating system that was dated back in the late 20th century, only that as it grew it went from being free into something that was sold onto corporations for big money. Over time Bill Gates used UNIX to create the very first form of windows written in C. 2 years after the release of Windows in 1983 C became C++. Gradually over time Windows evolved and Microsoft wanted to remove UNIX from the equation in an effort to become independent, and they did as such by ripping away everything good that UNIX brought for the sake of making their own proprietary versions of it through MS-DOS. The other computer company at the time, Apple, chose to file a lawsuit over their grand scheme of things involving the software for the Lisa project, for imitating all the new qualities they had to offer, but Steve Jobs was in it for the loss in this lawsuit as by the time Windows was improved they had already removed all the things that made Lisa what it was.

However we are forgetting the one thing that made the two towers what they are, and that is UNIX. The Unix philosophy, originated by Ken Thompson, is a set of cultural norms and philosophical approaches to minimalist, modular software development. This means cram all the tools that programmer needs to get the job done into one system. Because that's what it was back then, and surprisingly enough, the two towers received more praise for what they had built with these tools than any credit was given to UNIX itself. Some of the richest billionaires today originally made their software with UNIX, because what they created, they sold to others to make millions and human greed became obviously larger than creativity and ingenuity. The owner of Bell Labs, AT&T at the time, decided that it was time to give into a new and hip thing called Sun Microsystems, I even had a professor during college who worked for the bastards, corporate greed ruined the computer age and once again took away everything good.

From the article I reference all of this from (http://www.faqs.org/docs/artu/ch02s01.html):

"AT&T divested its interest in Sun in 1992; then sold its Unix Systems Laboratories to Novell in 1993; Novell handed off the Unix trademark to the X/Open standards group in 1994; AT&T and Novell joined OSF in 1994, finally ending the Unix wars. In 1995 SCO bought UnixWare (and the rights to the original Unix sources) from Novell. In 1996, X/Open and OSF merged, creating one big Unix standards group.

But the conventional Unix vendors and the wreckage of their wars came to seem steadily less and less relevant. The action and energy in the Unix community were shifting to Linux and BSD and the open-source developers. By the time IBM, Intel, and SCO announced the Monterey project in 1998 — a last-gasp attempt to merge One Big System out of all the proprietary Unixes left standing — developers and the trade press reacted with amusement, and the project was abruptly canceled in 2001 after three years of going nowhere.

The industry transition could not be said to have completed until 2000, when SCO sold UnixWare and the original Unix source-code base to Caldera — a Linux distributor. But after 1995, the story of Unix became the story of the open-source movement. There's another side to that story; to tell it, we'll need to return to 1961 and the origins of the Internet hacker culture."

The two of the many towers built upon UNIX kept this ideal, and for many years they kept it for the sole reason that things should be kept simple, and that was actually Apple, and Linux. Today both kind of systems comprise and envelop the technological advances that we have today, almost half of all consumer electronic equipment that houses software is built upon what we used to call back then UNIX, and is now called Linux.


So having control of UNIX can expand the boundaries locking you or holding you back from excelling with modern technologies from creating your own or simply hacking into them. Even apple kept everything that made UNIX what it was because it was so great, remember back then computers were aimed at technical people who wanted to program software, which is why the primary CLI interface for Apple is BASH (or Bourne Again Shell) itself! Yes you can in fact hack half the users on the planet who depend on Macbook pros, isn't that great! Laughing out loud As such, I've decided to embark this using the only book I've ever appreciated on the matter titled Advanced UNIX Programming by Marc J. Rochkind

https://www.amazon.com/Advanced-UNIX-programming-Marc-Rochkind/dp/013011... .

A lot of the calls I used in my post with shellcode programming reference some of the stuff detailed in this book, and even while I can call myself a slightly seasoned C programmer, I can without a doubt say this is one of those tiny things you will struggle with if you don't know the internals of what makes up a computer at a software or kernel level. Anyway lets begin, also I will be using Ubuntu 18.04 in these examples, feel free to follow along as you wish.

[Meeting with the Roots]
A file is the primitive form of what makes software from both the kernel and user level perspective. An enitre operating system itself can be considered a file, every collection of bytes clumped together is a file, data, in its rawest form from magnetic ones and zeroes travels through buses from a disk into what is called a file. Files have something called i-numbers, and and i-number is an index to an array of something called i-nodes. Each i-node within the array contains only information about the files, as listed below:

-type of file
-number of links (to be explained later)
-user and group ID
-the three sets of access permissions for each group (rwx-rwx-rwx)
-size in bytes
-time of last modification
-status change
-and lastly the address to where the file is stored on the disk

We all know what directories are, and directories were created simply because referring to the i-number is inconvenient (you wont say "file on i-number 1345324352136", instead we say file in C:\System32\"), each directory contains a two-column table, one with the name of the files the other with the i-number of a file. The combination of both the i-number and the name of the file is called a link. Directories themselves are files as well, but their i-node type is marked as directory.

For instance lets say I want to navigate a path named /memo/july/smith, and this is contained in our current directory, then the kernel first instructs the system to look for i-node of memo within the current directory, so the kernel first instructs the system to get the i-node of the current directory to get its data bytes, then search within those data bytes for the word memo, if it sees the word memo, get the i-number of that directory, and search through the data bytes of the file associated with the i-number for and i-node containing the word july, and the cycle repeats. You can examine information about i-nodes and their corresponding i-numbers using ls -li and df -hi like so:

$ls -li                                                                                                                                                                                             ✔
total 56
39845992 drwxr-xr-x 2 ghost ghost 4096 Jan 21 19:38  Desktop
39845996 drwxr-xr-x 6 ghost ghost 4096 Apr 24 00:55  Documents
39845993 drwxr-xr-x 2 ghost ghost 4096 Mar 20 21:03  Downloads
39845891 -rw-r--r-- 1 ghost ghost 8980 Jan 21 19:23  examples.desktop
39848152 drwxrwxr-x 7 ghost ghost 4096 Feb 13 20:36  GNS3
39845997 drwxr-xr-x 3 ghost ghost 4096 Mar 20 20:27  Music
39845998 drwxr-xr-x 4 ghost ghost 4096 Apr 26 13:01  Pictures
39845995 drwxr-xr-x 2 ghost ghost 4096 Jan 21 19:38  Public
39845994 drwxr-xr-x 2 ghost ghost 4096 Jan 21 19:38  Templates
39845999 drwxr-xr-x 2 ghost ghost 4096 Jan 21 19:38  Videos
40239629 drwxrwxr-x 4 ghost ghost 4096 Feb 18 12:39 'VirtualBox VMs'
41287767 drwxrwxr-x 5 ghost ghost 4096 Feb 10 11:55  vmware

^i-number^                                         ^i-node with name^

$ df -hi                                                                                                                                                                                           1
Filesystem     Inodes IUsed IFree IUse% Mounted on
udev             2.0M   652  2.0M    1% /dev
tmpfs            2.0M  1.3K  2.0M    1% /run
/dev/sda2         44M  282K   44M    1% /
tmpfs            2.0M    99  2.0M    1% /dev/shm
tmpfs            2.0M     7  2.0M    1% /run/lock
tmpfs            2.0M    18  2.0M    1% /sys/fs/cgroup
/dev/loop2        25K   25K      100% /snap/gtk-common-themes/1198
/dev/loop5       9.7K  9.7K      100% /snap/core18/731
/dev/loop3        271   271      100% /snap/gnome-characters/206
/dev/loop4        27K   27K      100% /snap/gnome-3-26-1604/74
/dev/loop13      1.3K  1.3K      100% /snap/gnome-calculator/260
/dev/loop8        354   354      100% /snap/gnome-logs/57
/dev/loop0        271   271      100% /snap/gnome-characters/254
/dev/loop9        734   734      100% /snap/gnome-system-monitor/77
/dev/loop10       354   354      100% /snap/gnome-logs/61
/dev/loop1        11K   11K      100% /snap/chromium/645
/dev/loop6        734   734      100% /snap/gnome-system-monitor/70
/dev/loop11       27K   27K      100% /snap/gnome-3-26-1604/82
/dev/loop12       24K   24K      100% /snap/gtk-common-themes/1122
/dev/loop7        11K   11K      100% /snap/chromium/691
/dev/loop14      1.6K  1.6K      100% /snap/gnome-calculator/352
/dev/loop18       13K   13K      100% /snap/core/6531
/dev/loop21      1.6K  1.6K      100% /snap/gnome-characters/139
/dev/loop16       11K   11K      100% /snap/chromium/660
/dev/loop26       27K   27K      100% /snap/gnome-3-26-1604/78
/dev/loop27       747   747      100% /snap/gnome-system-monitor/57
/dev/loop23      9.7K  9.7K      100% /snap/core18/941
/dev/loop25       28K   28K      100% /snap/gnome-3-28-1804/23
/dev/loop15       28K   28K      100% /snap/gnome-3-28-1804/36
/dev/loop17       13K   13K      100% /snap/core/6673
/dev/loop28      1.6K  1.6K      100% /snap/gnome-calculator/406
/dev/loop24       13K   13K      100% /snap/core/6405
/dev/loop22      9.7K  9.7K      100% /snap/core18/782
/dev/loop20      1.7K  1.7K      100% /snap/gnome-logs/45
/dev/loop19       27K   27K      100% /snap/gtk-common-themes/818
/dev/sda1                       - /boot/efi
tmpfs            2.0M    24  2.0M    1% /run/user/121
tmpfs            2.0M    35  2.0M    1% /run/user/1000

And if you notice, there is a limit to the number of i-nodes you can have, as an added tip for my fellow hackers, if you max out the number of i-nodes you can have, then no user on the computer will be able to make a new file Wink. The kernel knows where to start depending on the current directory it is on. This is known as the relative path, but the real path, is called the absolute path. The absolute path, in nix anyway, begins with the root directory /, and the kernel simply reserves the i-number 2 for it, as established when the system is first constructed for it. It is possible for two or more links to refer to the same i-number, meaning that the same file may have more than one name. Removing a link will result in a decrement in something called a link count, when the link count reaches 0, the kernel discards the file.

A special file is a file that is either a device or a FIFO (first in first out), which is a mechanism used to pass data between processes. There are two kinds of device special files, block and character. Block special files contain an array of fixed-size blocks usually 512 bytes in size, and a pool of kernel buffers used as cache to speed up I/O interactions between the device and the system. Character special files don't follow this convention, they might do I/O in small chunks (like typing on a keyboard) or big chunks (reading from a flash drive). Devices may contain both, block and character files. A special file has an i-node, but there aren't any data bytes on the disk for the i-node to point to; instead, in place of data bytes a device number is used instead, which is used as an index for a table used by the kernel to find a collection of subroutines called by a device driver. Here's an article describing the behavior of the /dev directory and how these special files work in theory https://www.linuxjournal.com/article/2597.

[Programs and Processes]
A program is simply a collection of instructions and data meant to pass through a processor that are kept within a file. Plain and simple, clear cut and nothing else, you can shoot me later on this. When you decode each of the individual characters in an executable they translate into their opcode counterparts in order to do xyz that the system tells it to. I'm not going to get into the specifics of program compilation as you should already have that introduced when you learned C before reading this post so I'll give the synopsis for a process now.

A process is an environment in which a program executes, an instance if you will, through the use of threads in order to send instructions to the processor. So collectively you can think of a process as a space that holds instructions and data, sending both data and instructions through a thread to go through the processor:


A process consists of 3 segments to it, the (.text) segment which holds the readable text, the (.data) segment, and the system data segment that holds information about the system (ie. current directory, open file descriptors, and accumulated CPU time). Now the most important thing about processes is that you can have both parent and child processes, in which the child is created by the kernel. If a parent has a file open, then the child will also have the same file open meaning that a child inherits most if not all of the system attributes the parent has.

Signals are sent around by the kernel between processes. A signal can originate from one process sending one to itself, one process sending to another, and the kernel sending a signal to the process, such as the segmentation fault we programmers often get while making mistakes. Another example is pressing a key on your keyboard. Simple.

[Process IDs and Process groups]
Every process is given a process ID or PID, a positive integer. Every process except for one has a parent in a system, and every process except for this one process is given a PID higher than 0. The process with PID 0, which is used by the kernel itself, is used for swapping (when RAM runs out of space program data is sent to a page file called a swap partition on Linux).

The focus here is parents and their children. A processes system data will take note of the ID its parent has, this ID is known as the Parent Process-ID. When a parent throws its kid out in the streets, someone finds the kid and sends the kid to a foster care and a new parent adopts it. Likewise, when a parent process orphans its child process, the child process assigns itself the Parent Process ID to 1. Why PID 1? Because PID 1 is the ID for the process known as init, in other words initialization process for the system is the primary process for the entire system, and the eventual ancestor of all processes.

Programmers might choose to implement a subsystem as a group of related processes, which creates a Process Group, and of course given a Process Group ID. One of the group members is called the group leader, each member of the group has the group leader's PID as its Process Group ID. The kernel of Linux provides a system call (or a signal) that can be used to broadcast one signal to the rest of the group, we will get into this later, but as an example, in order to kill the whole process so you don't end up with loose processes, a signal would be broadcasted to terminate every process in the group to kill the group leader. A group member may also become its own group leader.

Just like processes,users are also given IDs upon the login command being invoked on the system. The User ID or UID is located in the /etc/passwd file, and the first process related to the user is given the PID to be the same as the UID. Likewise groups are also given an ID, and this is contained in the /etc/group file. I wont get into the specifics but as a seasoned programmer might know, each group and user has permissions set into every file they can use, and you can use the chmod command to adjust those permissions. Also a seasoned programmer might know that there are two kinds of ID, a real user ID or an effective user ID, same goes for the group, normally they would be the same but sometimes they arent. The effective IDs are used to set the permissions, and the real IDs are used to for accounting and user to user communication.

User IDs are important because they also dictate whether or not you can run a particular command like passwd to change the password. To run this command the effective user ID becomes 0 for root. I'll let this sink in for a moment. A particular loophole used to exist and doesn't anymore, where you can copy the sh command to your own created directory and use the chmod command to turn on the set-user-id bit and chown to change the file's owner to root. If you do this, you can execute your copy of sh and take on the privileges of root.

This should be good enough for now, next up in our list of things to do is go over File I/O. Before doing so I recommend doing some research online about what system calls are and how to use them. Once you figure out what they are and how they are used I can introduce you to this lovely table full of them: http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/