Since about 2003, we also have the so-called long mode--64-bit registers/addressing (although, not all 64 bits are used for addressing yet), flat memory model, and RIP-based addressing (addressing relative to the instruction pointer register). In this book, we will work with 32-bit protected (there is such a thing as the 16-bit protected mode, but that is out of scope) and Long, which is a 64-bit mode of operation. The long mode may be considered a 64-bit extension of the protected mode, which evolved from 16-bit to 32-bit. It is important to know that registers accessible in the earlier mode are also accessible in the newer mode, meaning that the registers that were accessible in the real mode are also accessible in the protected mode, and that registers accessible in the protected mode would also be accessible in the long mode (if long mode is supported by the processor). There are a few exceptions regarding the bit width of certain registers and we will look at this soon in this chapter. However, since 16-bit modes (real and 16-bit protected modes) are no longer used by application developers (with minor possible exceptions), in this book, we will work on protected and long modes only.
General purpose registers
Depending on the mode of the operation (protected or long), there are 8 to 16 available general purpose registers in modern Intel processors. Each register is divided into subregisters, allowing access to data with a bit width lower than the width of the register.
The following table shows general purpose registers (further referred to as GPR):
Table 1: x86/x86_64 registers
Note
All R* registers are only available in the long mode. Registers SIL, DIL, BPL, and SPL are only available in the long mode. Registers AH, BH, CH, and DH cannot be used in instructions that are not valid outside the long mode.
For convenience, we will refer to the registers by their 32-bit names (such as EAX, EBX, and so on) when we do not need to explicitly refer to a register of a certain bit width. The preceding table shows all general purpose registers available on the Intel platform. Some of them are only available in the long mode (all 64-bit registers, R* registers, and a few of the 8-bit registers) and certain combinations are not allowed. However, despite the fact that we can use those registers for any purpose, some of them do have a special meaning in certain circumstances.
The EAX register is also known as an accumulator and is used with multiplication and division operations, both as implied and target operands. It is important to mention that the result of a binary multiplication is twice the size of the operands and the result of a binary division consists of two parts (quotient and remainder), each of which has the same bit width as the operands. Since the x86 architecture began with 16-bit registers and for the sake of backward compatibility, the EDX register is used for storing partial results when the values of the operands are larger than could fit into 8 bits. For example, if we want to multiply two bytes, 0x50
and 0x04
, we would expect the result to be 0x140
, which cannot be stored in a single byte. However, since the operands were 8 bits in size, the result is stored into the AX register, which is 16 bits. But if we want to multiply 0x150
by 0x104
, the result would need 17 bits to be stored (0x150 * 0x104 = 0x15540
) and, as we have mentioned already, the first x86 registers were only 16 bits. This is the reason for using an additional register; in the case of the Intel architecture, this register is EDX (to be more precise, only the DX part would be used in this specific case). As a verbal explanation may sometimes be too generalized, it would be better to simply demonstrate the rule.
Division implies a slightly different rule. To be more precise, this is the inverted multiplication rule, meaning that the result of the operation is half the bit width of the dividend, which in turn means that the largest dividend in the long mode may be 128-bit wide. The smallest dividend value remains the same as in the smallest value of the source operand in the case of multiplication--8 bits.
ECX register - also known as counter register. This register is used in loops as a loop iteration counter. It is first loaded with a number of iterations, and then decremented each time the loop instruction is executed until the value stored in ECX becomes zero, which instructs the processor to break out of the loop. We can compare this to the do{...}while()
clause in C:
int ecx = 10;
do
{
// do your stuff
ecx--;
}while(ecx > 0);
Another common usage of this register, actually the usage of its least significant part, CL, is in bitwise shift operations, where it contains the number of bits in which the source operand should be shifted. Consider the following code, for example:
mov eax, 0x12345
mov cl, 5
shl eax, cl
This would result in the register EAX being shifted 5 bits to the left (having the value of 0x2468a0
as a result).
An ESP register is the stack pointer. This register, together with the SS register (the SS register is explained a bit later in this chapter), describes the stack area of a thread, where SS contains the descriptor of the stack segment and ESP is the index that points to the current position within the stack.
Source and destination indices
ESI and EDI registers serve as source and destination index registers in string operations, where ESI contains the source address and EDI, obviously, the destination address. We will talk about these registers a bit more in Chapter 3, Intel Instruction Set Architecture (ISA).
EBP. This register is called the base pointer as its most common use is to point to the base of a stack frame during function calls. However, unlike the previously discussed registers, you may use any other register for this purpose if needed.
Another register worth mentioning here is EBX, which, in the good old days of 16-bit modes (when it was still just a BX register), was one of the few registers that we could use as a base for addressing. Unlike EBP, EBX was (in the case of the XLAT instruction, which by default uses DS:EBX, still is) intended to point to a data segment.
There is one more special register that cannot be used for data storage--EIP (IP in the real mode or RIP in the long mode). This is the instruction pointer and contains the address of the instruction after the instruction currently being executed. All instructions are implicitly fetched from the code segment by the CPU; thus the full address of the instruction following the one being executed should be described as CS:IP. Also, there is no regular way to modify its content directly. It is not impossible, but we can't just use a mov
instruction in order to load a value into EIP.
All the other registers have no special meaning from the processor's perspective and may be used for any purpose.
The CPU itself has no means for floating point arithmetic operations. In 1980, Intel announced the Intel 8087 - the floating point coprocessor for the 8086 line. 8087 remained as a separate installable device until 1989, when Intel came up with the 80486 (i486) processor, which had an integrated 8087 circuit. However, when talking about floating point registers and floating point instructions, we still refer to 8087 as a floating-point unit (FPU) or, sometimes, still as a floating-point coprocessor (however, the latter is becoming more and more rare).
8087 has eight registers, 80 bits each, arranged in a stack fashion, meaning that operands are pushed onto this stack from the memory and results are popped from the topmost register to the memory. These registers are named ST0 to ST7 (ST--stack) and the most used one, that is, the ST0 register, may be referred to as simply ST.
The floating-point coprocessor supports several data types:
- 80-bit extended-precision real
- 64-bit double-precision real
- 32-bit single-precision real
- 18-digit decimal integer
- 64-bit binary integer
- 32-bit binary integer
- 16-bit binary integer
The floating-point coprocessor will be discussed in more detail in Chapter 3, Intel Instruction Set Architecture (ISA).
Segment registers and memory organization
Memory organization is one of the most important aspects of CPU design. The first thing to note is that when we say "memory organization", we do not mean its physical layout on memory chips/boards. For us, it is much more important how the CPU sees memory and how it communicates with it (on a higher level, of course, as we are not going to dive into the hardware aspects of the architecture).
However, as the book is dedicated to application programming, rather than operating system development, we will further consider the most relevant aspects of memory organization and access in this section.
Segment registers are a rather interesting topic, as they are the ones that tell the processor which memory areas may be accessed and how exactly they may be accessed. In real mode, segment registers used to contain a 16-bit segment address. The difference between a normal address and segment address is that the latter is shifted 4 bits to the right when stored in the segment register. For example, if a certain segment register was loaded with the 0x1234
value, it, in fact, was pointing to the address 0x12340
; therefore, pointers in real mode were rather offsets into segments pointed to by segment registers. As an example, let's take the DI register (as we are talking about a 16-bit real mode now), which is used with the DS (data segment) register automatically, and load it with, let's say, 0x4321
when the DS register is loaded with the 0x1234
value. Then the 20-bit address would be 0x12340 + 0x4321 = 0x16661
. Thus, it was possible to address at most 1 MB of memory in real mode.
There are in total six segment registers:
- CS: This register contains the base address of the currently used code segment.
- DS: This register contains the base address of the currently used data segment.
- SS: This register contains the base address of the currently used stack segment.
- ES: This is the extra data segment for the programmer's use.
- FS and GS: These were introduced with the Intel 80386 processor. These two segment registers have no specific hardware-defined function and are for the programmer's use. It is important to know that they do have specific tasks in Windows and Linux, but those tasks are operating system dependent only and have no connection to hardware specifications.
The CS register is used together with the IP register (the instructions pointer, also known as the program counter on other platforms), where the IP (or EIP in protected mode and RIP in long mode) points to the offset of the instruction in the code segment following the instruction currently being executed.
DS and ES are implied when using SI and DI registers, respectively, unless another segment register is implicitly specified in the instruction. For example, the lodsb
instruction, although, it is written with no operands, loads a byte from the address specified by DS:SI into the AL register and the stosb
instruction (which has no visible operands either) stores a byte from the AL register at the address specified by ES:DI. Using SI/DI registers with other segments would require explicitly mentioning those segments with the relevant segment register. Consider the following code, for example:
mov ax, [si]
mov [es:di], ax
The preceding code loads a double word from the location pointed by DS:SI and stores it to another location pointed by ES:DI.
The interesting thing about segment registers and segments at all is that they may peacefully overlap. Consider a situation where you want to copy a portion of code to either another place in the code segment or into a temporary buffer (for example, for decryptor). In such a case, both CS and DS registers may either point to the same location or the DS register may point somewhere into the code segment.
Protected mode - segmentation
While it was all fine and simple in real mode, things become a bit more complicated when it comes to protected mode. Unfortunately, memory segmentation is still intact, but the segment register no longer contain addresses. Instead, they are loaded with the so-called selectors, which are, in turn, the indices into the descriptor table multiplied by 8 (shifted 3 bits to the left). The two least significant bits designate the requested privilege level (0 for kernel space to 3 for user land). The third bit (at index 2) is the TI bit (table indicator), which indicates whether the descriptor being referred is in a global descriptor table (0) or in a local descriptor table (1). The memory descriptor is a tiny 8-byte structure, which describes the range of physical memory, its access rights, and some additional attributes:
Table 2: Memory descriptor structure
Descriptors are stored in at least two tables:
- GDT: Global descriptor table (used by the operating system)
- LDT: Local descriptor table (per task descriptor table)
As we may conclude, the organization of memory in protected mode is not that different from that in real mode after all.
There are other types of descriptors--interrupt descriptors (stored in the interrupt description table (IDT)) and system descriptors; however, since these are in use in kernel space only, we will not discuss them, as that falls out of the scope of this book.
Paging is a more convenient memory management scheme introduced in 80386 and has been a bit enhanced since then. The idea behind paging is memory virtualization--this is the mechanism that makes it possible for different processes to have the same memory layout. In fact, the addresses we use in pointers (if we are writing in C, C++, or any other high-level language that compiles into native code) are virtual and do not correspond to physical addresses. The translation of a virtual address into a physical address is implemented in hardware and is performed by the CPU (however, some operating system interventions are possible).
By default, a 32-bit CPU uses a two-level translation scheme for the derivation of a physical address from the supplied virtual one.
The following table explains how a virtual address is used in order to find a physical address:
Table 3: Virtual address to physical address translation
Most, if not all, modern processors based on the Intel architecture also support Page Size Extension (PSE), which makes it possible to use the so-called large pages of 4 MB. In this case, the translation of a virtual address into a physical address is a bit different, as there is no page table any more. The following table shows the meaning of bits in a 32-bit virtual address:
Table 4: Virtual address to physical address translation with PSE enabled
Furthermore, the Physical Address Extension (PAE) was introduced, which significantly changes the scheme and allows access to a much bigger range of memory. In protected mode, PAE adds a page directory pointer table of four entries and the virtual to physical address conversion would be as per the following table:
Table 5: Virtual to physical address translation with PAE enabled (no PSE)
Enabling PSE in addition to PAE forces each entry in the page directory to point directly to a 2 MB page instead of an entry in a page table.
The only address virtualization allowed in long mode is paging with PAE enabled; however, it adds one more table--the page map level 4 table as the root entry; therefore, the conversion of a virtual address to a physical address uses the bits of a virtual address in the way described in the following table:
Table 6: Virtual to physical address translation in long mode
It is, however, important to mention that despite the fact that it is a 64-bit architecture, the MMU only uses the first 48 bits of the virtual address (also called the linear address).
The whole process of address resolution is performed by the memory management unit (MMU) in the CPU itself, and the programmer is only responsible for actually building these tables and enabling PAE/PSE. However, this topic is much wider than may be covered in a single chapter and falls a bit out of the scope of this book.
In addition to control registers, processors also have a set of so-called debug registers, which are mostly used by debuggers for setting the so-called hardware breakpoints. These registers are in fact a very powerful tool when it comes to control over other threads or even processes.
Debug address registers DR0 - DR3
Debug registers 0 to 3 (DR0, DR1, DR2, and DR3) are used to store virtual (linear) addresses of the so-called hardware breakpoints.
Debug control register (DR7)
DR7 defines how the breakpoints set in Debug Address Registers should be interpreted by the processor and whether they should be interpreted at all.
The bits layout of this register is shown in the following table:
Table 3: DR7 bit layout
L* bits, when set to 1, enable breakpoint at the address which is specified in the corresponding Debug Address Register locally--within a task. These bits are reset by the processor on each task switch. G* bits, on the contrary, enable breakpoints globally--for all tasks, meaning that these bits are not reset by the processor.
The R/W* bits specify breakpoint conditions, as follows:
00
: Break on instruction execution01
: Break when the specified address is accessed for writing only10
: Undefined11
: Break on either read or write access or when an instruction at the specified address is executed
The LEN* bits specify the size of a breakpoint in bytes, thus, allowing coverage of more than one instruction or more than one byte of data:
00
: Breakpoint is 1-byte long01
: Breakpoint is 2-bytes long10
: Breakpoint is 8-bytes long (long mode only)11
: Breakpoint is 4-bytes long
Debug status register (DR6)
When an enabled breakpoint is triggered, the corresponding bit of the four low-order bits in DR6 is set to 1 before entering the debug handler, thus, providing the handler with information about the triggered breakpoint (bit 0 corresponds to the breakpoint in DR0, bit 1 to the breakpoint in DR1, and so on).
It would have been impossible to write programs in any language for a given platform if the processor had no means to report its status and/or the status of the last operation. More than that, the processor itself needs this information from time to time. Try to imagine a processor unable to conditionally control the execution flow of a program--sounds like a nightmare, doesn't it?
The most common way for a program to obtain information on the last operation or on a certain configuration of an Intel-based processor is through the EFlags register (E stands for extended). This register is referred to as Flags in real mode, EFlags in protected mode, or RFlags in long mode.
Let's take a look at the meaning of the individual bits (also referred to as flags) of this register and its usage.
The carry flag (CF) is mostly used for the detection of carry/borrow in arithmetic operations and is set if the bit width result of the last such operation (such as addition and subtraction) exceeds the width of the ALU. For example, the addition of two 8-bit values, 255 and 1, would result in 256, which requires at least nine bits to be stored. In such a case, bit eight (the ninth bit) is placed into the CF, thus, letting us and the processor know that the last operation ended with carry.
The parity flag (PF) is set to 1 in case the number of 1s in the least significant byte is even; otherwise, the flag is set to zero.
The adjust flag (AF) signals when a carry or borrow occurred in the four least significant bits (lower nibble) and is primarily used with binary coded decimal (BCD) arithmetics.
The zero flag (ZF) is set when the result of an arithmetic or bitwise operation is 0. This includes operations that do not store the result (for example, comparison and bit test).
The sign flag (SF) is set when the last mathematical operation resulted in a negative number; in other words, when the most significant bit of the result was set.
When set, the trap flag (TF) causes a single step interrupt after every executed instruction.
Bit #9 - interrupt enable flag
The interrup enable flag (IF) defines whether processor will or will not react to incoming interrupts. This flag is only accessible in real mode or at the Ring 0 protection level in other modes.
The direction flag (DF) controls the direction of string operations. An operation is performed from the lower address to the higher address if the flag is reset (is 0) or from the higher address to the lower address if the flag is set (is 1).
The overflow flag (OF) is sometimes perceived as two's complement form of the carry flag, which is not really the case. OF is set when the result of the operation is either too small or too big a number to fit into the destination operand. For example, consider the addition of two 8-bit positive values, 0x74 and 0x7f. The resulting value of such an addition is 0xf3, which is still 8-bit, which is fine for unsigned numbers, but since we added two values that we considered to be signed, there has to be the sign bit and there are no more bits to store the 9-bit signed result. The same would happen if we try to add two negative 8-bit values, 0x82 and 0x81. The meaning of the addition of two negative numbers is in the fact subtraction of a positive number from a negative number, which should result in an even smaller number. Thus, 0x82 + 0x81 would result in 0x103, where the ninth bit, 1, is the sign bit, but it cannot be stored in an 8-bit operand. The same applies to larger operands (16, 32, and 64-bit).
The remaining 20 bits of the EFlags register are not that important for us while in user-land except, probably, the ID bit (bit #21). The ID flag indicates whether we can or cannot use the CPUID instruction.
Bits 32 - 63 of the RFlags register in long mode would be all 0s.