Explanation Of The Processor, Compiler & Assembler with C

Audience: Developers, Nerds and all people who are interested in stuff with processors and other stuff like that.

A little bit of C:

I hope you are quite familiar with C syntax. For all they are not, here is a simple <hello_world>.

hello world example in c

If you want to execute this program, you need to compile it with a c compiler. I can recommend the gcc compiler. After installation, you should be able to execute to the following instructions in your shell/terminal (I presuppose that you have created a helloWorld.c file and copy the code above into this file.

  • gcc -o hello.out helloWorld.c” will create an executable binary
create executable library example
  • ./hello.out” will execute your binary and should print out <hello world> in your console.

Perfect. Now we can start with some simple data structures and figure out how a compiler and your operating system handle all this magic stuff behind the scenes.

But first, what is your perspective to the program? 

If you write a program, do you think in source code? Or have you ever ask yourself, what really happened when you execute your program? (In case it was the second, great. If not: think in a second way.

In fact, your computer doesn’t understand source code. Syntax’s like the C syntax only exists so that humans can read it. Your computer (CPU) only understands binary code, for example (011101101). Here joins the compiler to the game, it compiles your human-readable-syntax into something that is called an “assembler”.

Oh wait, you said it would be compiled into binary code? Are you kidding me ?? – No, it is only another abstraction that makes it possible to read it. Assembler is a collection from machine code instructions that make it possible to read stuff like “01010101” as “move”, “oder”, “push”.

If you interested in the x86 cheat sheet, check this out.

Let’s see how the compile task is working

If you add “-v” (verbose) to the GCC compile job, it will print out any task in the console.

Let’s see the final result 

  • objdump -D hello.out | grep -A15 main.” will disassemble your binary and print the 15 rows after the main instruction.

disassemble binary example

Each byte will be shown as a hexadecimal value. Do you know why it uses hexadecimal notation? – Yes, you’re right: Every byte exists from 8 Bit. The hexadecimal system has a base of 16.  Every byte has 2^8 possible states, that means with only 2 hexadecimal marks you can describe the value of every possible state from a byte. (So cool).

Notice the hex values on the left are the addresses where your byte instructions are stored. (For example “100000f60” or “100000000000000000000111101100000”). Every byte has its own unique address.

So, do you know what “100000f60: 55 push %rbp” means? Honestly, I don’t know either. At this point, let’s say thanks Intel and AT&T for Assembler technology.

A little less more information before we jump in

Each Processor uses some internal variables or better-called registers. These registers are divided into the following structure:

  • EAX (Accumulator)
  • ECX (Counter)
  • EDX (Data)
  • EBX (Basis)
  • ESP (Stack Pointer)
  • EBP (Base Pointer)
  • ESI (Source Index)
  • EDI (Destination Index)
  • EIP (Instruction Pointer)

The EIP or instruction pointer register holds the current instruction that the processor will execute. We’ll take a look into it with GDB and disassemble it. At first, make sure that you have GDB installed on your system.

You can check this by typing “gdb -v” in your terminal.

terminal state screenshot

To start a disassembly session, see the following commands

disassembly session

(If you run Mac OS Mojave and have some trouble, check this on stack overflow)

Start GDB with the following command: “gdb -q ./hello.out). The arguments “ -q ” is for ”Do not print the introductory and copyright messages. These messages are also suppressed in batch mode.” You can find all the arguments and available modes here.

Please make sure, that you run GDB with intel syntax

To configure that, please run the following instructions
disassembly example 2

Make sure that .gdbinit is in your home directory.  “cat ~./gdbinit”

OK, let´s see GDB in action

In this section, I’ll show you the basic commands to get familiar with gdb.

set breakpoint
gdb example

set breakpoint and run
If you have started a debugging session and you want to set a breakpoint in the main function, enter the following command:

  • “break main”

After that, you can trigger that breakpoint with the command “run”. GDB now stops the whole execution around the main function.

get information from registers

As I explained earlier in the article, the processor has some registers like EIP, EAX and so on. To get the current state from this register, use the commands in the example below:

register current state

The command “I r” is the shortcut for “info registers”. (and we love shortcuts)

get the current value from specific register
specific register output

With the command “I r $eip”, GDB will give you access to the current value from the target register.

get value from the target
gdb target output

“x” is the shortcut for “examine”. With examine, you can see what is currently stored in the target register. As you can see, with “x/4bx $eip” you will get the first 4 bytes in the hexadecimal notation.

Examine has the possibility to process a direct conversion to decimal, char and octal. It is also possible to get a DWORD, HALFWORD and so on.

What comes in the next articles?

In the next article we will discuss:

  • The difference between 32Bit and 64Bit System and why is this bull for datatypes like INT. And, of course, why there are so much faster the 32 Bit based systems.
  • More about Bytes/Bits, hexadecimal, octal and so
  • Working “really” with GDB, this was just a little introduction
  • Discuss stuff like heap, stack, and overflows (puffers)
  • Maybe try to overwrite some values and compile a new executable

In the meantime, feel free to follow me on Twitter @diClNeEASY!

The Author: AICDEV

Leave a Reply