You're reading from Mastering Malware Analysis The complete malware analyst's guide to combating malicious software, APT, cybercrime, and IoT attacks

Product type Paperback

Published in Jun 2019

Publisher Packt

ISBN-13 9781789610789

Length 562 pages

Edition 1st Edition

Languages

Python

Concepts

Malware Analysis

Authors (2):

Alexey Kleymenov

Amr Thabet

View More author details

Becoming familiar with x86 (IA-32 and x64)

Intel x86 (IA-32 and x64) is the most common architecture used in PCs and is powering many servers, so there is no surprise that most of the malware samples we have at the moment are supporting it. x86 is a CISC architecture, and it includes multiple complex instructions in addition to simple ones. In this section, we will introduce the most common of them, along with how compilers take advantage of them in their calling conventions.

Registers

Here is a table showing the relationship between registers in IA-32 and x64 architectures:

Figure 3: Registers used in the x86 architecture

r8 to r15 are available only in x64 and not in IA-32, and spl, bpl, sil, and dil can be accessed only in x64.

The first four registers (rax, rbx, rcx, and rdx) General-Purpose Registers (GPRs), but some of them have the following special use for certain instructions:

rax/eax: This is used to store information and it's a special register for some calculations
rcx/ecx: This is used as a counter register in loop instructions
rdx/edx: This is used in division to return the modulus

In x64, the registers from r8 to r15 are also GPRs that were added to the available GPRs.

The rsp/esp register is used as a stack pointer that points to the top of the stack. It moves when there's a value getting pushed up, or down, when there's a value getting pulled out from the stack. The rbp/ebp register is used as a frame pointer, which means it points to the bottom of the stack and it's helpful for the function's local variable, as we will see later in this section. In addition to this, rbp/ebp is sometimes used as a GPR for storing any kind of data.

rsi/esi and rdi/edi are used mostly to define the addresses when copying a group of bytes in memory. The rsi/esi register always plays the role of the source and the rdi/edi register plays the role of the destination. Both registers are non-volatile and are also GPRs .

The instruction structure

For Intel x86 assembly (IA-32 or x64), the common structure of its instructions is opcode, dest, and src.

Let's get deeper into them.

opcode

opcode is the name of the instruction. Some instructions have only opcode without any dest or src such as the following:

Nop, pushad, popad, movsb

pushad and popad are not available in x64.

dest

dest represents the destination or where the result of the calculations will be saved, as well as becoming part of the calculations themselves like this:

add eax, ecx ;eax = (eax + ecx)
 sub rdx, rcx ;rdx = (rdx - rcx)

Also, it could play a role of a source and a destination with some opcode instructions that take only dest without a source:

inc eax
 dec ecx

Or, it could be only the source, such as these instructions that save the value to the stack like this:

push rdx
 pop rcx

dest could look like the following:

REG: A register such as eax and edx.
r/m: A place in memory such as the following:

DWORD PTR [00401000h]
 BYTE PTR [EAX + 00401000h]
 WORD PTR [EDX*4 + EAX+ 30]

A value in the stack (used to represent local variables), such as the following:

DWORD PTR [ESP+4]
 DWORD PTR [EBP-8]

src

src represents the source or another value in the calculations, but it doesn't save the results afterward. It may look like this:

REG: For instance, add rcx and r8
r/m: For instance, add ecx and dword ptr [00401000h]
imm: An immediate value such as mov eax and 00100000h

The instruction set

Here, we will cover the different types of instructions that we listed in the previous section.

Data manipulation instructions

Some of the arithmetic instructions are as follows:

Instruction	Structure	Description
`add`/`sub`	`add`/`sub dest, src`	`dest = dest + src`/`dest = dest - src`
`inc`/`dec`	`inc`/`dec dest`	`dest = dest + 1`/`dest = dest - 1`
`mul`	`mul src`	(Unsigned multiply) `rdx:rax = rax* src`
`div`	`div src`	`rdx:rax`/`src` (returns the result in `rax` and the remainder/modulus in `rdx`)

Additionally, for logic and bits manipulation, they are like this:

Instruction	Structure	Description
`or`/`and`/`xor`	`or`/`and`/`xor dest`, or `src`	`dest = dest & src`/`dest = dest` \| `src/dest = dest ^ src`
`not`	`not dest`	`dest = !dest` (the bits are flipped)

And, lastly, for shifts and rotations they are like this:

Instruction

Structure

Description

shl/shr

shl/shr dest, imm, or cx

(the dest register's maximum number of bits such as 32 or 64)

dest = dest << src/dest = dest >> src
(shifts the dest register's bits to the left or the right, which is the same effect as multiplying or dividing by two src times)

rol/ror

shl/shr dest, imm, or cx

(same as shl and shr)

Rotates the dest register's bits left or right

Data transfer instructions

There's a mov instruction, which copies a value from src to dest. This instruction has multiple forms, as we can see in this table:

Instruction	Structure	Description
`mov`	`mov dest` or `src`	`dest = src`
`movsx`/`movzx`	`movsx`/`movzx dest` or `src`	`src` is smaller than `dest` (`src` is 16-bits and `dest` is 32-bits) `movzx`: Sets the remaining bits in `dest` to zero `movsx`: Preserves the sign of the `src` value

Other instructions related to stack are like this:

Instruction	Structure	Description
`push`/`pop`	`push`/`pop dest`	Pushes the value on to the top the stack (`esp = esp -4`)/ pulls the value out of the stack (`esp = esp + 4`)
`pushad`/`popad`	`pushad`/`popad`	Saves all registers to the stack/pulls out all registers from the stack (in x86 only)

For string manipulation, they are like this:

Instruction	Structure	Description
`lodsb`/`lodsw`/`lodsd`/`lodsq`	`lodsb`/`lodsw`/`lodsd`/`lodsq`	Loads a byte, 2 bytes, 4 bytes, or 8 bytes from `rsi`/`esi` into `al`/`ax`/`eax`/`rax`
`stosb`/`stosw`/`stosd`/`stosq`	`stosb`/`stosw`/`stosd`/`stosq`	Stores a byte, 2 bytes, 4 bytes, or 8 bytes in `rdi`/`edi` from `al`/`ax`/`eax`/`rax`
`movsb`/`movsw`/`movsd`/`movsq`	`movsb`/`movsw`/`movsd`/`movsq`	Copy a byte, 2 bytes, 4 bytes, or 8 bytes from `rsi`/`esi` to `rdi`/`edi`

Flow control instructions

Some of the unconditional redirections are as follows:

Instruction	Structure	Description
`jmp`	`jmp <relative address>` `jmp DWORD/QWORD ptr [Absolute Address]`	The relative address is calculated from the start of the next instruction after `jmp` to the destination
`call`	`call <relative address>` `call DWORD/QWORD ptr [Absolute Address]`	Same as `jmp` but it saves the return address in the stack
`ret`/`retn`	`ret imm`	Pulls the return address from the stack, cleans the stack from the pushed arguments, and jumps to that address

Some of the conditional redirections are as follows:

Instruction	Structure	Description
`jnz`/`jz`/`jb`/`ja`	`jz`/`jnz` `<relative address>`	Similar to `jmp`, but jumps based on a condition
`loop`	`loop` `<relative address>`	Similar to `jmp`, but it decrements `rcx`/`ecx` and jumps if it didn't reach zero (uses `rcx`/`ecx` as a loop counter)
`rep`	`rep opcode dest` or `src` (if needed)	`rep` is a prefix that is used with string instructions; it decrements `rcx`/`ecx`, and repeats the instruction until `rcx`/`ecx` reaches zero

Arguments, local variables, and calling conventions (in x86 and x64)

There are multiple ways in which the compilers represent functions, calls, local variables, and more. We will not be covering all of them, but we will be covering some of them. We will cover standard call (stdcall), which is only used in x86, and then we will be covering the differences between the other calls and stdcall.

stdcall

The stack, rsp/esp, and rbp/ebp registers do most of the work when it comes to arguments and local variables. The call instruction saves the return address at the top of the stack before transferring the execution to the new function, and the ret instruction at the end of the function returns the execution back to the caller function using the return address saved in the stack.

Arguments

For stdcall, the arguments are also pushed in the stack from the last argument to the first like this:

Push Arg02
 Push Arg01
 Call Func01

In the call function, the arguments can be accessed by rsp/esp but keeping in mind how many values have been pushed to the top of the stack through time with something like this:

mov eax, [esp + 4] ;Arg01
 push eax
 mov ecx, [esp + 8] ; Arg01 keeping in mind the previous push

In this case, the value located at the address specified by the value inside the square brackets is transferred. Fortunately, modern static analysis tools, such as IDA Pro, can detect which argument is being accessed in each instruction, as in this case.

The most common way to access arguments, as well as local variables, is by using rbp/ebp. First, the called function needs to save the current rsp/esp in rbp/ebp register and then access them this way:

push ebp
 mov ebp, esp
 ...
 mov ecx, [ebp + 8] ;Arg01
 push eax
 mov ecx, [ebp + 8] ;still Arg01 (no changes)

And, at the end of the called function, it returns back the original value of rbp/ebp and the rsp/esp like this:

mov esp,ebp
 pop ebp
 ret

As it's a common function epilogue, Intel created a special instruction for it, which is leave, so it became this:

leave
 ret

Local variables

For local variables, the called function allocates space for them by shifting the rsp/esp instruction up. To allocate space for two variables of four bytes each, the code will be this:

push ebp
 mov ebp,esp
 sub esp, 8

Additionally, the end of the function will be this:

mov ebp,esp
 pop ebp
 ret

Figure 4: An example of a stack change at the beginning and at the end of the function

Additionally, if there are arguments, the ret instruction cleans the stack given the number of bytes to pull out from the top of the stack like this:

ret 8 ;2 Arguments, 4 bytes each

cdecl

cdecl (which stands for c declaration) is another calling convention that was used by many C compilers in x86. It's very similar to stdcall, with the only difference being that the caller cleans the stack after the callee function (the called function) returns like this:

Caller:
    push Arg02
    push Arg01
    call Callee
    add esp, 8 ;cleans the stack

fastcall

The __fastcall calling convention is also widely used by different compilers, including Microsoft C++ compiler and GCC. This calling convention passes the first two arguments in ecx and edx, and pushes the remaining arguments in the stack. It's only used in x86 as there's only one calling convention for x64.

thiscall

For object-oriented programming and for the non-static member functions (such as the classes' functions), the C compiler needs to pass the address of the object whose attribute will be accessed or manipulated using this function as an argument.

In GCC compiler, this call is almost identical to the cdecl calling convention and it passes the object address as a first argument. But in the Microsoft C++ compiler, it's similar to stdcall and it passes the object address in ecx. It's common to see such patterns in some object-oriented malware families.

The x64 calling convention

In x64, the calling convention is more dependent on the registers. For Windows, the caller function passes the first four arguments to the registers in this order: rcx, rdx, r8, r9, and the rest are pushed back to the stack. While for the other operating systems, the first six arguments are usually passed to the registers in this order: rsi, rdi, rcx, rdx, r8, r9, and the remaining to the stack.

In both cases, the called function cleans the stack after using ret imm, and this is the only calling convention for these operating systems in x64.