You're reading from Mastering Malware Analysis The complete malware analyst's guide to combating malicious software, APT, cybercrime, and IoT attacks

Product type Paperback

Published in Jun 2019

Publisher Packt

ISBN-13 9781789610789

Length 562 pages

Edition 1st Edition

Languages

Python

Concepts

Malware Analysis

Authors (2):

Alexey Kleymenov

Amr Thabet

View More author details

Basic concepts

Most people don't really understand that the processor is pretty much a smart calculator. If you look at most of its instructions (whatever the assembly language is), you will find many of them dealing with numbers and doing some calculations. However, there are multiple features that actually differentiate processors from usual calculators:

Processors have access to a bigger memory space compared to traditional calculators. This memory space gives them the ability to store billions of values, which allows them to perform more complex operations. Additionally, they have multiple fast and small memory storage units embedded inside the processors' chip called registers.
Processors support many instruction types other than arithmetic instructions, such as changing the execution flow based on certain conditions.
Processors are able to communicate with other devices (such as speakers, mics, hard disks, graphics card, and so on).

Armed with such features in conjunction with great flexibility, processors became the go-to smart machines for technologies such as AI, machine learning, and others. In the following sections, we will explore these features and later will dive deeper into different assembly languages and how these features are manifested in these languages' instruction set.

Registers

As most of the processors have access to a huge memory space storing billions of values, it takes longer for the processor to access the data (and it gets complex, as we will see later). So, to speed up the processor operations, they contain small and fast internal memory storage units called registers.

Registers are built into the processor chip and are able to store the immediate values that are needed while performing calculations and data transfer from one place to another.

Registers may have different names, sizes, and functions, depending on the architecture. Here are some of the types that are widely used:

General data registers: General data registers are registers that are used to save values or results from different arithmetic and logical operations.
Stack and frame pointers: These are registers that are used to point to the beginning and the end of the stack.
Instruction pointer/program counter: The instruction pointer is used to point to the start of the next instruction to be executed by the processor.

Memory

Memory plays an important role in the development of all smart devices that we see nowadays. The ability to manage lots of values, text, images, and videos on a fast and volatile memory allows processors to process more information and display graphical interfaces in 3D and virtual reality.

Virtual memory

In modern operating systems, whether they are 32-bit, 64-bit, or whatever the size of the physical memory, the operating system allocates a fixed size, isolated virtual memory (in which its pages are mapped to the physical memory pages) for each application to secure the operating system's and the other applications' data.

Each application only has the ability to access their own virtual memory. They have the ability to read, write, or execute instructions in their virtual memory pages. Each virtual memory page has a set of permissions assigned to it that represent the type of operations that the application is allowed to execute on this page. These permissions are read, write, and execute. Additionally, multiple permissions can be assigned to each memory page.

For an application to access any stored value inside a memory address, it needs a virtual address, which is basically the address of where this value is stored in memory.

Despite knowing the virtual address, access can be hindered by another issue, which is storing this virtual address. The size of the virtual address in 32-bit systems is 4 bytes and in 64-bit systems is it 8 bytes. This means we need to allocate another space in memory to store that virtual address. For this new space in memory, we will need to store its own memory address in another memory space that will lead us to an infinite loop, as shown in the following figure:

Figure 1: Virtual memory addresses

To solve this condition, multiple solutions are used nowadays, and in the next section, we will cover one of them, which is the stack.

Stack

Stack literally means a pile of objects. In computer science, a stack is basically a data structure that helps to save different values in memory with the same size in a pile structure using the principle of Last in First Out (LIFO).

A stack is pointed to by two registers (the frame pointer points to its top and the stack pointer points to its bottom).

A stack is common between all known assembly languages and it has several functions. For example, it may help in solving mathematical equations, such as X = 5*6 + 6*2 + 7(4 + 6), by storing each calculated value and pushing each one in the stack, and later pop ping (or pulling) them back to calculate the sum of all of them and saving them in variable X.

It is also commonly used to pass arguments (especially if there are a lot of them) and store local variables.

A stack is also used to save the return addresses just before calling a function or a subroutine. So, after this routine finishes, it pops the return address back from the top of the stack and returns it to where it was called from to continue the execution.

While the stack pointer is generally pointing to the current top of the stack, the frame pointer is keeping the address of the top of the stack before the subroutine call, so it can be easily restored after it is returned.

Branches, loops, and conditions

The second feature that processors have is the ability to change the execution flow of a program based on a given condition. In every assembly language, there are multiple comparison instructions and flow control instructions. The flow control instructions can be divided into the following categories:

Unconditional jump: This is a type of instruction that forcefully changes the flow of the execution to another address (without any given condition).
Conditional jump: This is like a logical gate that switches to another branch based on the given condition (such as equal to zero, greater than, or lower than), as shown in the following figure:

Figure 2: An example of a conditional jump

Call: This changes the execution to another function and saves the return address in the stack.

Exceptions, interrupts, and communicating with other devices

In assembly language, communication with different hardware devices is done through what's called interrupts.

An interrupt is a signal to the processor sent by the hardware or software indicating that there's something happening or there is a message to be delivered. The processor suspends its current running process, saving its state, and executes a function called an interrupt handler to deal with this interrupt. Interrupts have their own notation and are widely used to communicate with hardware for sending requests and dealing with their responses.

There are two types of interrupts. Hardware interrupts are generally used to handle external events when communicating with hardware. Software interrupts are caused by software, usually by calling a particular instruction. The difference between an interrupt and an exception is that exceptions take place within the processor rather than externally. An example of an operation generating an exception can be a division by zero.