x86 Assembly

x86 Assembly

x86 (32-bit) and x86-64 (64-bit) assembly is syntactic sugar for machine code that gets executed by a CPU. The two flavors of assembly are intel and att, however intel is much more digestible.

Assembly Introduction Reference

Wikibooks Reference

Registers

There are 16 registers available, 3 of which are special purpose.

8 Byte RegisterLower 4 BytesLower 2 BytesLower Byte
rbpebpbpbpl
rspespspspl
ripeip
raxeaxaxal
rbxebxbxbl
rcxecxcxcl
rdxedxdxdl
rsiesisisil
rdiedididil
r8r8dr8wr8b
r9r9dr9wr9b
r10r10dr10wr10b
r11r11dr11wr11b
r12r12dr12wr12b
r13r13dr13wr13b
r14r14dr14wr14b
r15r15dr15wr15b

Segment Registers

Early CPU implementations had special registers for memory segmentation, essentially allowing more addresses than could fit in a single register. The x86-64 architecture does not use segmentation in 64-bit mode, though it is still supported for backwards compatibility. In 64-bit mode, all segment registers are forced to 0 except for FS and GS which allows the OS to use them for special purposes.

Register (2 Bytes)Description
csCode segment
dsData segment
ssStack segment
esExtra segment
fsExtra segment
gsExtra segment

Reference

Calling Conventions

When a function is called in assembly, generally what happens:

  1. Save caller-saved registers (if any)
  2. Push the arguments onto the stack (if any)
  3. Push the return address onto the stack
  4. Push the base pointer onto the stack
  5. Set the stack pointer to the base pointer
  6. Subtract the stack pointer to make room for local variables
  7. The return value will be stored in rax (64-bit) or eax (32-bit)

64-bit Conventions

Reference

32-bit Conventions

Reference

System Calls

System calls are a way for programs to request the kernel to do something on the user’s behalf, such as open, read, write files or fork and exec processes.

In 64-bit, the syscall instruction is used with the syscall number in rax. Arguments are put into registers rdi, rsi, rdx, rcx, r8, and r9. rcx and r11 will be clobbered by the syscall and the return value will be in rax.

In 32-bit, the int 0x80 instruction is used with the syscall number in eax. Arguments are put into registers rbx, rcx, rdx, rsi, rdi, and rbp. The return value will be in eax.

Reference

Common System Call Numbers

PlatformNumberDescription
32-bit0xbexecve(char *path, char *argv[], char *envp[])
64-bit0x3bexecve(char *path, char *argv[], char *envp[])

32-Bit Reference

64-Bit Reference

Flags

Flags are stored in a special 32-bit EFLAGS register.

BitNameDescription
0CFCarry Flag; set if the last arithmetic operation carried or borrowed a bit beyond the size of the register
2PFParity Flag; set if the number of set bits in the least significant byte is a multiple of 2
4AFAdjust Flag; carry of binary coded decimal numbers arithmetic operations
6ZFZero Flag; set if the result of an operation is 0
7SFSign Flag; set if the result of an operation is negative
8TFTrap Flag; set if step by step debugging
9IFInterruption Flag; set if interrupts are enabled
10DFDirection Flag; stream direction. If set, string operations will decrement their pointer rather than incrementing it, reading memory backwards
11OFOverflow Flag; set if signed arithmetic operations result in a value too large for the register to contain
12/13IOPLI/O Privilege Level field (2 bits); I/O Privilege Level of the current process
14NTNested Task flag; controls chaining of interrupts. Set if the current process is linked to the next process
16RFResume Flag; response to debug exceptions
17VMVirtual-8086 Mode; set if in 8086 compatibility mode
18ACAlignment Check; set if alignment checking of memory references is done
19VIFVirtual Interrupt Flag; virtual image of IF
20VIPVirtual Interrupt Pending flag; set if an interrupt is pending
21IDIdentification flag; support for CPUID instruction if can be set

Instructions

Instructions are variable length and encoded in bytes (not included here). This is not an exhaustive list.

InstructionDescription
mov dst, srcMove data from register src to register dst
mov dst, [src]Move data from memory at register src to register dst
mov [dst], srcMove data from register src to memory at register dst
lea dst, [src]load effective address; move the address of src to dst
add dst, srcAdd dst and src and store the result in dst
sub dst, srcSubtract dst and src and store the result in dst
xor dst, srcXor dst and src and store the result in dst
push srcGrow the stack (subtract bytes) and copy the value in src to the top of the stack
pop dstCopy the value at the top of the stack into dst and shrink the stack (add bytes)
jmp addrSet the instruction pointer (rip) to addr
call addrPush rip onto the stack, then jump to addr
leaveResets the stack frame; set the stack pointer to the base pointer, then pop the base pointer from stack
retPop rip from the stack
jnz addrJump to addr if the zero flag is not set
jz addrJump to addr if the zero flag is set

Reference

PIE and RELRO

Position Independent Executable and Relocation Read-Only are security measures to make exploiting binaries harder.

PIE means that the assembly code is position independent, and the linker can perform address space layout randomization (ASLR). This is only referring to the main executable (i.e. not dynamically linked code). In most cases, ASLR is applied to shared libraries, the stack, and the heap regardless of PIE. ASLR works in tandem with the PLT and GOT.

The PLT (Procedure Linkage Table) is a table of function stubs that are called in place of the actual, dynamically linked functions. The address of the actual dynamically linked function is stored in the GOT (Global Offset Table), which is populated by the dynamic linker at runtime. By default, the GOT is populated lazily whenever the PLT stub is called, however these can be controlled at linkage time with the -zlazy and -znow flags to ld or gcc.

An executable can be checked for PIE and linkage flags with

readelf --dynamic $EXE

Additionally, there are two types of RELRO: partial and full. Partial RELRO is the default and forces the GOT to come before the BSS in memory (see ELF). This prevents a buffer overflow of a global variable overwriting GOT entries. Full RELRO makes the entire GOT read-only, which increases the startup time for the executable as all symbols must be resolved before the program starts. This can significantly impact startup times, and as such is not the default.