//Assembly// is a class of low level programming languages in which instructions map directly to machine code instructions that the CPU can understand. In other words, an assembly language is the human readable equivalent of machine code. While //Assembly// languages share a lot of similarities, they are architecture dependent. //x86 Assembly// is different from //ARM Assembly//. Because of this, assembly is far less portable than higher level languages, such as C: you can write a C program that can be run on both x86 and ARM (after recompilation), but this is not possible with //Assembly//. For convenience, in the rest of this series, we will refer to //x86 Assembly// as simply //Assembly//, but it is important to understand the difference. //Assembly// code is translated into machine code by a tool called an //assembler.// The main difference between an //assembler// and a //compiler //is that an assembler simply translates the human readable instructions into machine code, while compilers have a lot more work to do; an instruction in a high level language such as C might translate into many machine code instructions. ====== Instruction syntax ====== Assembly instructions have the following syntax: %%mnemonic operand1, operand2, operand3...%% The mnemonic is a keyword that represents a specific instruction. The number of operands depends on the instruction. A lot of instructions, such as %%mov%% which copies data from one place to another, or %%add%% which adds up numbers, will also use operand 1 as the destination. For example: %%add eax, 20%% This will replace the contents of %%eax%% with the result of %%eax + 20%%. Written in C, this code would look like this: %%eax = eax + 20;%% ===== Intel vs AT&T syntax ===== What I have shown so far is using the Intel syntax. There are in fact 2 flavors of x86 assembly, one developed by Intel, and the other developed by AT&T Bell Labs. //GAS//, the assembler that comes with //GCC//, as well as the //GDB //debugger will use the AT&T syntax by default, but it can be changed. //NASM//, which is the assembler we will use, uses the Intel syntax. Here is a small example, highlighting the main differences: === Intel === %%mov eax, 5%% %%add esp, 24h%% %%mov eax, [ebx + ecx*4 + offset]%% * Destination always on the left * No prefixes for registers and constants * Instruction size is determined automatically based on operand size === AT&T === %%movl $5, %eax%% %%addl $0x24, %esp%% %%movl offset(%ebx,%ecx,4), %eax%% * Destination always on the right * Registers are prefixed with %, constants with $ * Mnemonics suffixed with a letter indicating the size of the operands You can read more details about this subject on Wikipedia. In this tutorial, we will only use the Intel syntax which in my personal opinion is easier to read. ====== Some important instructions ======