Audience: Developers, Nerds and all people who are interested in stuff with processors and other stuff like that.
A little bit of C:
I hope you are quite familiar with C syntax. For all they are not, here is a simple <hello_world>.
If you want to execute this program, you need to compile it with a c compiler. I can recommend the gcc compiler. After installation, you should be able to execute to the following instructions in your shell/terminal (I presuppose that you have created a helloWorld.c file and copy the code above into this file.
- “gcc -o hello.out helloWorld.c” will create an executable binary
- “./hello.out” will execute your binary and should print out <hello world> in your console.
Perfect. Now we can start with some simple data structures and figure out how a compiler and your operating system handle all this magic stuff behind the scenes.
But first, what is your perspective to the program?
If you write a program, do you think in source code? Or have you ever ask yourself, what really happened when you execute your program? (In case it was the second, great. If not: think in a second way.
In fact, your computer doesn’t understand source code. Syntax’s like the C syntax only exists so that humans can read it. Your computer (CPU) only understands binary code, for example (011101101). Here joins the compiler to the game, it compiles your human-readable-syntax into something that is called an “assembler”.
Oh wait, you said it would be compiled into binary code? Are you kidding me ?? – No, it is only another abstraction that makes it possible to read it. Assembler is a collection from machine code instructions that make it possible to read stuff like “01010101” as “move”, “oder”, “push”.
Let’s see how the compile task is working
If you add “-v” (verbose) to the GCC compile job, it will print out any task in the console.
Let’s see the final result
- “objdump -D hello.out | grep -A15 main.” will disassemble your binary and print the 15 rows after the main instruction.
Each byte will be shown as a hexadecimal value. Do you know why it uses hexadecimal notation? – Yes, you’re right: Every byte exists from 8 Bit. The hexadecimal system has a base of 16. Every byte has 2^8 possible states, that means with only 2 hexadecimal marks you can describe the value of every possible state from a byte. (So cool).
Notice the hex values on the left are the addresses where your byte instructions are stored. (For example “100000f60” or “100000000000000000000111101100000”). Every byte has its own unique address.
A little less more information before we jump in
Each Processor uses some internal variables or better-called registers. These registers are divided into the following structure:
- EAX (Accumulator)
- ECX (Counter)
- EDX (Data)
- EBX (Basis)
- ESP (Stack Pointer)
- EBP (Base Pointer)
- ESI (Source Index)
- EDI (Destination Index)
- EIP (Instruction Pointer)
The EIP or instruction pointer register holds the current instruction that the processor will execute. We’ll take a look into it with GDB and disassemble it. At first, make sure that you have GDB installed on your system.
You can check this by typing “gdb -v” in your terminal.
To start a disassembly session, see the following commands
(If you run Mac OS Mojave and have some trouble, check this on stack overflow)
Start GDB with the following command: “gdb -q ./hello.out). The arguments “ -q ” is for
Please make sure, that you run GDB with intel syntax
To configure that, please run the following instructions
Make sure that .gdbinit is in your home directory.
OK, let´s see GDB in action
In this section, I’ll show you the basic commands to get familiar with gdb.
set breakpoint and run
If you have started a debugging session and you want to set a breakpoint in the main function, enter the following command:
- “break main”
After that, you can trigger that breakpoint with the command “run”. GDB now stops the whole execution around the main function.
get information from registers
As I explained earlier in the article, the processor has some registers like EIP, EAX and so on. To get the current state from this register, use the commands in the example below:
The command “I r” is the shortcut for “info registers”. (and we love shortcuts)
get the current value from specific register
With the command “I r $eip”, GDB will give you access to the current value from the target register.
get value from the target
“x” is the shortcut for “examine”. With examine, you can see what is currently stored in the target register. As you can see, with “x/4bx $eip” you will get the first 4 bytes in the hexadecimal notation.
Examine has the possibility to process a direct conversion to decimal, char and octal. It is also possible to get a DWORD, HALFWORD and so on.
What comes in the next articles?
In the next article we will discuss:
- The difference between 32Bit and 64Bit System and why is this bull for datatypes like INT. And, of course, why there are so much faster the 32 Bit based systems.
- More about Bytes/Bits, hexadecimal, octal and so
- Working “really” with GDB, this was just a little introduction
- Discuss stuff like heap, stack, and overflows (puffers)
- Maybe try to overwrite some values and compile a new executable
In the meantime, feel free to follow me on Twitter @diClNeEASY!