Assembly

Assembly refers to the lowest-level language of any computer architecture that is human readable. Every computer architecture has it's own dialect (or instruction set) of assembly (even virtual machine platforms like .NET and JVM have their own instruction sets), thus it is not portable. Assembly use to be required for gamedev as graphics libraries weren't as common as they are now. However, it is still used for debugging and optimization for more advanced games, particularly for console games since you know exactly what hardware it has thus portability isn't as big of an issue and optimization is more valuable.

Currently, x86 has the largest instruction set.

Pros and Cons

It is generally not recommended to write games entirely in assembly language unless you are targeting very old, weak, or obscure hardware like the NES or Commodore 64. However, it is still a good language to know and useful when you really need to take full advantage of the hardware you are using, or for debugging since you can get a deeper understanding of your program at a lower level.

Pros

  • Full advantage of the hardware
  • Minimal CPU overhead
  • Embeddedable in some higher-lever languages via inline-assembly
  • Ability to target much older and weaker hardware, making it ideal for homebrew development and ROM hacking
  • Gives you a much better understanding of the inner workings of your program and the hardware (or VM) that you are working with
  • Useful for debugging and reverse engineering

Cons

  • Not beginner friendly
  • Requires more advanced knowledge about the hardware you are targeting
  • Requires much more preparation and planning before you could write something, otherwise you will end up with spaghetti code
  • Not portable
  • Modern compilers often produce better assembly than most humans, so higher-level code may have better optimization than hand-written assembly

Properties

  • Low-level
  • Unstructured - no blocks for if-statements, loops, etc.
  • No real variables; only registers

Resources

Tutorials

Instruction Set References

Instruction Format References

Assemblers

Basics

These code examples are going to use MIPS Assembly, but most instruction sets are similar to each other, so it is easy to learn another instruction set after learning one. These are not meant to teach how to create actual programs, rather teach the basic concepts of Assembly as it's very different compared to mid-high level languages. Also, these following examples assume that you have some basic C programming knowledge.

Instructions

Instructions in assembly are the equivalent to statements in mid to high-level languages. They are a 1:1, human-readable representation of the actual binary opcode that the CPU understands. Only one instruction can be done per line of code and instructions cannot take multiple lines. Each instruction contains arguments that are separated by commas.

<instruction name> <destination register>, <register1>, <register2>  # Comments go after the instruction and use the number sign

Registers

Assembly has no variables, at least not in the same way that higher-level languages do. Instead you have registers which are special memory locations stored in the CPU itself. In MIPS, you have 18 registers for integers: 8 permanent registers ($s0-$s7) and 10 temporary registers ($t0-$t9). Permanent registers preserve their value between function calls while this is not guaranteed for temporary registers. Registers are denoted by a dollar sign and a letter for the type of register they are. Some registers like $zero and $k are reserved for the assembler and cannot be changed. Anything else in memory is stored in RAM which needs to be loaded and stored manually. MIPS stores and loads values in words (4 bytes or 32 bits). In MIPS, this is done by the sw (store word) and lw (load word) instructions:

sw $s0, (20)$s1 # Store the value of $s0 into the memory address of $s1 plus the offset of 20
lw $s0, (20)$s1 # Load the value of the memory address of $s1 plus the offset of 20 into $s0

In the code example above, the memory addresses are stored in the register $s1 which is referred to as a base register which stores the address of a single word. The offset is a 16-bit signed integer value. The sum of the two values gives the exact location of the memory address. As the location of a memory address in MIPS is the same size as an instruction (32-bits), the only way to store and load bytes from RAM is to include a base register for the word that the address is stored in plus a 16-bit offset to get the exact location.

Labels

Labels allow you to jump to a line without referring to it's number. This is useful as line numbers will naturally change as your code changes. In most instruction sets like MIPS, labels are declared by their identifier, followed by a colon.

main:
add $s0, $s1, $s2

All references to labels are replaced by their respective line numbers by the assembler when being assembled into machine code much like how compilers replace constants to their literal values during compilation.

Arithmetic Expressions

Assembly can only do one mathematical operation per instruction, so a mathematical expression that requires multiple operations takes multiple instructions. Take this line of C code for example:

f = (a + b) - (c * d);

In MIPS Assembly, it would like something like this:

add $t0, $s0, $s1   # $s0 and $s1 are the registers for a and b respectively. Add them and store them in $t0
mult $t1, $s2, $s3  # $s2 and $s3 are the registers c and d respectively. Multiply them and store them in $t1
sub $s4, $t0, $t1    # subtract $t1 from $t0 and store it into $s4

Branch Instructions

Assembly is an unstructured language, so there are no code blocks for if-statements, loops, functions, etc. Branch instructions are used to conditionally jump to certain lines (or labels) of code. They jump to said line if the condition is true. The commonly used branch instructions in MIPS assembly are bne (branch not equal), beq (branch equal), blt (branch less than), and bgt (branch greater than). So an if-else statement in C like this:

if (a == b) {
    c = a + b;
}
else {
    c = a - b;
}

… would be written like this in MIPS assembly:

bne $s0, $s1, Else           # If $s0 (a) does not equal $s1 (b), goto Else
add $s2, $s0, $s1            # Add $s0 and $s1 and store it into $s2 (c)
Else:  sub $s2, $s0, $s1  # Subtract $s1 from $s0 and store it into $s2

Loops

As stated earlier, Assembly is an unstructured language, so there are no loop statements. Loops need to be done manually via branching and goto instructions. Take this while loop in C:

while (a < b) {
    a++;
}

In MIPS assembly, it looks something like this:

Loop: bge $s0, $s1, Exit    # If $s0 is greater than $s1, goto Exit
addi $s0, $s0, 1    # Add 1 to $s0 (addi is used instead of add when using a constant)
j Loop    # Goto loop
Exit:    # Code after loop goes here

Subroutine and Function Calls

Note: before going in to this, functions and subroutines are actually two different things, contrary to what most modern languages would lead you to believe. Functions return a value while subroutines only perform a set of instructions. This is important to note as Assembly has two different instructions for calling functions and subroutines respectively.

In MIPS assembly, subroutine and function calls are done through the j (jump) and jal (jump and link) instructions respectively. Arguments are passed through the registers $a0-$a3 and the return value is placed in registers $v0 and $v1. If there are anymore arguments or a struct that is being called by value, they need to be stored in something called a stack. Since Assembly is an unstructured language, functions and subroutines aren't defined anywhere in the code, and you need to jump to the line number or label manually. Let's use a simple C function that returns the sum of two arguments:

int main() {
    int a = 3;
    int b = 2;
    int c;
 
    c = sum(a, b);
 
    return 0;
}
 
int sum(int x, int y)  {
    return x + y;
}

The C compiler would probably structure the MIPS assembly like this:

main: li $s0, 3 # Set $s0 (a) to 3
li $s1, 2 # Set $s1 (b) to 2
move $a0, $s0 # Copy the value of $s0 into $a0
move $a1, $s1 # Copy the value of $s1 into $a1
jal sum # Jump and link to label "sum"

sum: add $v0, $a0, $a1 # Add $a0 (x) and $a1 (y) and store the value into $v0 (return value)
jr $ra # Return to the line number stored in $ra
move $s2, $v0 # Store the value of $v0 into $s0

The line number you were on prior to the function call is stored in the register $ra (return address). The instruction jr returns to said line number. The instruction move copies values from one register into another while the instruction li sets a register to a numerical value. Also, values stored in temporary registers are not guaranteed to be saved between subroutine calls (which is why they are called "temporary"), so any values stored in those registers that you want to keep should be copied to RAM or any of the permanent registers.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License