This issue of Practicing Ruby was directly inspired by Nick Morgan’s Easy 6502 tutorial. While the Ruby code in this article is my own, the bytecode for the Snake6502 game was shamelessly stolen from Nick. Be sure to check out Easy 6502 if this topic interests you; it’s one of the best programming tutorials I’ve ever seen.

The sea of numbers you see below is about as close to the metal as programming gets:

0600: 20 06 06 20 38 06 20 0d 06 20 2a 06 60 a9 02 85 
0610: 02 a9 04 85 03 a9 11 85 10 a9 10 85 12 a9 0f 85 
0620: 14 a9 04 85 11 85 13 85 15 60 a5 fe 85 00 a5 fe 
0630: 29 03 18 69 02 85 01 60 20 4d 06 20 8d 06 20 c3 
0640: 06 20 19 07 20 20 07 20 2d 07 4c 38 06 a5 ff c9 
0650: 77 f0 0d c9 64 f0 14 c9 73 f0 1b c9 61 f0 22 60 
0660: a9 04 24 02 d0 26 a9 01 85 02 60 a9 08 24 02 d0 
0670: 1b a9 02 85 02 60 a9 01 24 02 d0 10 a9 04 85 02 
0680: 60 a9 02 24 02 d0 05 a9 08 85 02 60 60 20 94 06 
0690: 20 a8 06 60 a5 00 c5 10 d0 0d a5 01 c5 11 d0 07 
06a0: e6 03 e6 03 20 2a 06 60 a2 02 b5 10 c5 10 d0 06 
06b0: b5 11 c5 11 f0 09 e8 e8 e4 03 f0 06 4c aa 06 4c 
06c0: 35 07 60 a6 03 ca 8a b5 10 95 12 ca 10 f9 a5 02 
06d0: 4a b0 09 4a b0 19 4a b0 1f 4a b0 2f a5 10 38 e9 
06e0: 20 85 10 90 01 60 c6 11 a9 01 c5 11 f0 28 60 e6 
06f0: 10 a9 1f 24 10 f0 1f 60 a5 10 18 69 20 85 10 b0 
0700: 01 60 e6 11 a9 06 c5 11 f0 0c 60 c6 10 a5 10 29 
0710: 1f c9 1f f0 01 60 4c 35 07 a0 00 a5 fe 91 00 60 
0720: a2 00 a9 01 81 10 a6 03 a9 00 81 10 60 a2 00 ea 
0730: ea ca d0 fb 60 

Although you probably can’t tell by looking at it, what you see here is assembled machine code for the venerable 6502 processor that powered many of the classic video games of the 1980s. When executed in simulated environment, this small set of cryptic instructions produces a minimal version of the Snake arcade game, as shown below:

In this article, we will build a stripped down 6502 simulator in JRuby that is complete enough to run this game. If you haven’t done much low-level programming before, don’t worry! Most of what follows is just ordinary Ruby code. I will also be showing you a ton of examples along the way, and those should help keep you on track. You might also want to grab full source code for the simulator, so that you can experiment with it while reading through this article.

Warmup exercise: Reverse engineering Snake6502

An interesting property of machine code is that if you know its structure, you can convert it back into assembly language. Among other things, the ability to disassemble machine code is useful for debugging and exploration purposes. Let’s try this out on Snake6502!

The output below shows memory locations, machine code, and assembly code for the first 28 instructions of the game. These instructions are responsible for initializing the state of the snake and the apple before the main event loop kicks off. You don’t need to understand exactly how they work right now, just try to get a feel for how the code in the hexdump column corresponds to the code in the assembly column:

address  hexdump     assembly
------------------------------
$0600    20 06 06    JSR $0606
$0603    20 38 06    JSR $0638
$0606    20 0d 06    JSR $060d
$0609    20 2a 06    JSR $062a
$060c    60          RTS
$060d    a9 02       LDA #$02
$060f    85 02       STA $02
$0611    a9 04       LDA #$04
$0613    85 03       STA $03
$0615    a9 11       LDA #$11
$0617    85 10       STA $10
$0619    a9 10       LDA #$10
$061b    85 12       STA $12
$061d    a9 0f       LDA #$0f
$061f    85 14       STA $14
$0621    a9 04       LDA #$04
$0623    85 11       STA $11
$0625    85 13       STA $13
$0627    85 15       STA $15
$0629    60          RTS
$062a    a5 fe       LDA $fe
$062c    85 00       STA $00
$062e    a5 fe       LDA $fe
$0630    29 03       AND #$03
$0632    18          CLC
$0633    69 02       ADC #$02
$0635    85 01       STA $01
$0637    60          RTS

If you look at the output carefully, you’ll be able to notice some patterns even if you don’t understand what the instructions themselves are meant to do. For example, each instruction is made up of between 1-3 bytes of machine code. The first byte in each instruction tells us what operation it is, and the remaining bytes (if any) form its operand.

If you take a look at the first four instructions, it is easy to see that the opcode 20 corresponds to the JSR instruction. Forming its operand is similarly straightforward, because it’s the same number in both places, just with opposite byte order:

20 06 06 -> JSR $0606  
20 38 06 -> JSR $0638
20 0d 06 -> JSR $060d
20 2a 06 -> JSR $062a

If you ignore the symbols in front of the numbers for the moment, mapping single byte operands is even easier, because they’re represented the same way in both the machine code and the assembly code. Knowing that the 85 opcode maps to the STA operation, it should be easy to see how 11, 13, 15 map to $11, $13, $15 in the following example:

85 11  -> STA $11
85 13  -> STA $13
85 15  -> STA $15

But the symbols in front of the numbers in assembly language obviously mean something. If you carefully look at the machine code, you’ll be able to find that the same operation can have multiple different opcodes, each of which identify a particular kind of operand:

a9 0f -> LDA #$0f
a5 fe -> LDA $fe

Without getting into too much detail here, the example above shows us that both a9 and a5 correspond to the LDA instruction. The difference between the two opcodes is that a9 treats its operand as an immediate value, and a5 interprets it as a memory address. In assembly code, this difference is represented syntactically (#$xx vs. $xx), but in the machine code we must rely on numbers alone.

The various ways of interpreting operands (called “addressing modes”) are probably the most confusing part of working with 6502 code. There are about a dozen of them, and to get Snake6502 running, we need to implement most of them. The good news is that every addressing mode is just a roundabout way of converting an operand into a particular address in memory, and once you have that address, the operations themselves do not care about how you computed it. Once you sweep all that stuff under the rug, you can end up with clean operation definitions like this:

# NOTE: 'e' refers to the address that was computed from the instruction's
# operand and addressing mode.

LDA { cpu[:a] = mem[e]  } 
STA { mem[e]  = cpu[:a] }

This realization also tells us that the memory module will not need to take addressing modes into account as long as they’re precomputed elsewhere. With that in mind, let’s get started building a storage model for our simulator. We’ll deal with the hairy problem of addressing modes later.

Memory

Except for a few registers that are used to store intermediate computations, the 6502 processor relies on its memory for pretty much everything. Program code, data, and the system stack all reside in the same 16-bit addressing space. Even flow control is entirely dependent on memory: the program counter itself is nothing more than an address that is used to look up the next instruction to run.

This “all in one bucket” approach is a double-edged sword. It makes it harder to write safe programs, but the tradeoff is that the storage model itself is very simple. Conceptually, the memory module is nothing more than a mapping between 16-bit addresses and 8-bit values:

describe "Storage" do
  let(:mem) { Vintage::Storage.new }

  it "can get and set values" do
    mem[0x1337] = 0xAE

    mem[0x1337].must_equal(0xAE)
  end

  # ...
end

But because the program counter keeps track of a ‘current location’ in memory at any point in time, there is a lot more we can do with this simple structure. Let’s walk through the remaining tests for Vintage::Storage to see what else it implements.

Program loading

When a program is loaded into memory, there is nothing special about the way it is stored, it’s just like any other data. In a real 6502 processer, a register is used to store the address of the next instruction to be run, and that address is used to read an opcode from memory. In our simulator, we can let the Storage class keep track of this number for us, incrementing it whenever we call the Storage#next method.

The following test shows how to load a program and then walk its code one byte at a time:

it "can load a bytecode sequence into memory and traverse it" do
  bytes = [0x20, 0x06, 0x06]

  mem.load(bytes)
  mem.pc.must_equal(program_offset) # load() does not increment counter

  bytes.each { |b| mem.next.must_equal(b) }

  mem.pc.must_equal(program_offset + 3)
end

The starting position of the program can be an arbitrary location, but to maintain compatibility with the simulator from the Easy6502 tutorial, we initialize the program counter to 0x600:

let(:program_offset) { Vintage::Storage::PROGRAM_OFFSET }

it "sets an initial position of $0600" do
  program_offset.must_equal(0x0600)

  mem.pc.must_equal(program_offset)
end

Flow control + branching

Very rudimentary flow control is supported by setting the program counter to a particular address, which causes the processor to jump to the instruction at that address:

it "implements jump" do
  mem.jump(program_offset + 0xAB)

  mem.pc.must_equal(program_offset + 0xAB)
end

Branching can be implemented by only calling jump when a condition is met:

it "implements conditional branching" do
  big   = 0xAB
  small = 0x01

  # a false condition does not affect mem.pc
  mem.branch(small > big, program_offset + 5)
  mem.pc.must_equal(program_offset)

  # true condition jumps to the provided address
  mem.branch(big > small, program_offset + 5)
  mem.pc.must_equal(program_offset + 5)
end

This test case is a bit contrived, so let’s take a look at some real Snake6502 code that illustrates how branching meant to be used:

$064d    a5 ff     LDA $ff      # read the last key pressed on the keyboard
$064f    c9 77     CMP #$77     # check if the key was "w" (ASCII code 0x77)
$0651    f0 0d     BEQ $0660    # if so, jump forward to $0660 
$0653    c9 64     CMP #$64     # check if the key was "d" (ASCII code 0x64)
$0655    f0 14     BEQ $066b    # if so, jump forward to $066b
$0657    c9 73     CMP #$73     # check if the key was "s" (ASCII code 0x73)
$0659    f0 1b     BEQ $0676    # if so, jump forward to $0676
$065b    c9 61     CMP #$61     # check if the key was "a" (ASCII code 0x61)
$065d    f0 22     BEQ $0681    # if so, jump forward to $0681

Presumably, the code at $0660 starts a procedure that moves the snake’s head up, the code at $066b moves it to the right, and so on. In other words, if one of these BEQ instructions finds a match, it will jump to the right place in the code to handle the relevant condition. But if no match is found, the processor will happily continue on to whatever code comes after this set of instructions in the program.

The tricky thing about using instructions that rely on jump (and consequently, branch) is that they are essentially GOTO statements. When you see one of these statements in the code, you know exactly what instruction will be executed next, but there’s no way of telling if it will ever return to the location it was called from. To get around this problem, we need support for subroutines that know how to return to where they’ve been called from. And to implement those, we need a system stack.

Stack operations

Here are the tests for how we’d like our stack to behave:

let(:stack_origin) { Vintage::Storage::STACK_ORIGIN }
let(:stack_offset) { Vintage::Storage::STACK_OFFSET }

it "has a 256 element stack between 0x0100-0x01ff" do
  stack_offset.must_equal(0x0100)
  stack_origin.must_equal(0xff) # this value gets added to the offset
end

it "implements stack-like behavior" do
  mem.sp.must_equal(stack_origin)

  mem.push(0x01)
  mem.push(0x03)
  mem.push(0x05)

  mem.sp.must_equal(stack_origin - 3)

  mem.pull.must_equal(0x05)
  mem.pull.must_equal(0x03)
  mem.pull.must_equal(0x01)

  mem.sp.must_equal(stack_origin)
end

As the tests indirectly suggest, the stack is a region in memory between$0100 and $01ff, indexed by a stack pointer (sp). Each time a value is pushed onto the stack, the value of the stack pointer is decremented, and each time a value is pulled, the pointer is incremented. This makes it so that the stack pointer always tells you where the “top of the stack” is.

Subroutines

With a stack in place, we’ll have most of what we need to implement “Jump to subroutine” (jsr) and “Return from subroutine” (rts) functionality. The behavior of these features will end up looking something like this:

it "implements jsr/rts" do
  mem.jsr(0x0606)
  mem.jsr(0x060d)

  mem.pc.must_equal(0x060d)

  mem.rts
  mem.pc.must_equal(0x0606)

  mem.rts
  mem.pc.must_equal(program_offset)
end

To make the above test pass, jsr needs to push the current program counter onto the stack before executing a jump to the specified address. Later when rts is called, the address is pulled out of the stack, and then another jump is executed to bring you back to where the last jsr command was executed. This works fine even in nested subroutine calls, due to the nature of how stacks work.

The only tricky part is that addresses are 16-bit values, but stack entries are limited to single byte values. To get around this problem, we need a couple helper functions to convert a 16-bit number into two bytes, and vice-versa:

it "can convert two bytes into a 16 bit integer" do
  mem.int16([0x37, 0x13]).must_equal(0x1337)
end

it "can convert a 16 bit integer into two bytes" do
  mem.bytes(0x1337).must_equal([0x37, 0x13])
end

These helpers will also come in handy later, when we need to deal with addressing modes.

Implementation

Behavior-wise, there is a lot of functionality here. In a high level environment it would feel a lot like we were mixing distinct concerns, but at the low level we’re working at it’s understandable that nearly infinite flexibility is desireable.

Despite the conceptual complexity, the Storage class is extremely easy to implement. In fact, it takes less than 80 lines of code if you don’t worry about validations and robustness:

module Vintage
  class Storage
    PROGRAM_OFFSET = 0x0600
    STACK_OFFSET   = 0x0100
    STACK_ORIGIN   = 0xff

    def initialize
      @memory = Hash.new(0)
      @pc     = PROGRAM_OFFSET
      @sp     = STACK_ORIGIN
    end

    attr_reader :pc, :sp

    def load(bytes)
      index = PROGRAM_OFFSET

      bytes.each_with_index { |c,i| @memory[index+i] = c }
    end

    def [](address)
      @memory[address]
    end

    def []=(address, value)
      @memory[address] = (value & 0xff)
    end

    def next
      @memory[@pc].tap { @pc += 1 }
    end

    def jump(address)
      @pc = address
    end

    def branch(test, address)
      return unless test

      @pc = address
    end

    def jsr(address)
      low, high = bytes(@pc)

      push(low)
      push(high)

      jump(address)
    end

    def rts
      h = pull
      l = pull

      @pc = int16([l, h])
    end

    def push(value)
      @memory[STACK_OFFSET + @sp] = value
      @sp -= 1
    end

    def pull
      @sp += 1

      @memory[STACK_OFFSET + @sp]
    end

    def int16(bytes)
      bytes.pack("c*").unpack("v").first
    end

    def bytes(num)
      [num].pack("v").unpack("c*")
    end
  end
end

For such boring code, its a bit surprising to think that it can be a fundamental building block for generic computing. Keep in mind of course that we’re building a simulation and not a real piece of hardware, and we’re doing it in one of the highest level languages you can use.

If it already feels like we’re cheating, just wait until you see the next trick!

Memory-mapped I/O

To implement Snake6502, our simulator needs to be able to generate random numbers, read keyboard input, and also display graphics on the screen. None of these features are directly supported by the 6502 instruction set, so that means that every individual system had to come up with its own way of doing things. This is one of many things that causes machine code (especially old-school machine code) to not be directly portable from one system to another.

Because we’re trying to get Snake6502 to run in our simulator without modifying its bytecode, we’re more-or-less constrained to following the approach used by the Easy6502 simulator: memory-mapped I/O.

This approach is actually very easy to implement in a simulated environment: you add hooks around certain memory addresses so that when they are accessed, they execute some custom code rather than directly reading or writing a value to memory. In the case of Snake6502, we expect the following behaviors:

  • Reading from $fe returns a random 8-bit integer.
  • Reading from $ff retrieves the ASCII code of the last key pressed on the keyboard.
  • Writing to addresses between $0200 to $05ff will render pixels to the screen. ($0200 is the top-left corner of the 32x32 display, and $05ff is the bottom-right corner.)

These features could be added directly to the Storage class, but it would feel a bit awkward to clutter up a generic module with some very specific edge cases. For that reason, it is probably better to implement them as a module mixin:

module Vintage
  module MemoryMap
    RANDOMIZER  = 0xfe
    KEY_PRESS   = 0xff
    PIXEL_ARRAY = (0x0200..0x05ff)

    attr_accessor :ui

    def [](address)
      case address
      when RANDOMIZER
        rand(0xff)
      when KEY_PRESS
        ui.last_keypress
      else
        super
      end
    end

    def []=(k, v)
      super

      if PIXEL_ARRAY.include?(k)
        ui.update(k % 32, (k - 0x0200) / 32, v % 16)
      end
    end
  end
end

You probably already have a good idea of how MemoryMap works from seeing its implementation, but it wouldn’t hurt to see an example of how it is used before we move on. Here’s how to display a single pixel on the screen, randomly varying its color until the spacebar (ASCII code 0x20) is pressed:

mem = Vintage::Storage.new
mem.extend(Vintage::MemoryMap)

mem.ui = Vintage::Display.new 

(mem[0x0410] = mem[0xfe]) until mem[0xff] == 0x20 

It’s worth noting that this is the only code in the entire simulator that directly depends on a connection to some sort of user interface, and the protocol consists of just two methods: ui.update(x, y, color) and ui.last_keypress. In our case, we use a JRuby-based GUI, but anything else could be substituted as long as it implemented these two methods.

At this point, our storage model is pretty much complete. We now can turn our attention to various number crunching features.

Registers and Flags

In order to get Snake6502 to run, we need all six of the programmable registers that the processor provides. We’ve handled two of them already (the stack pointer and the program counter), so we just have four more to implement: A, X, Y, and P. A few design constraints will help make this work go a whole lot faster:

  • Most of the operations that can be done on A are done the same way on X and Y, so we can implement some generic functions that operate on all three of them.

  • We can implement the status register (P) as a collection of individual attributes, rather than seven 1-bit flags packs into a single byte.

  • Because Snake6502 only relies on the (c)arry, (n)egative, and (z)ero flags from the status register, we can skip implementing the other four status flags and still have a playable game.

With those limitations in mind, let’s work through some specs to understand how this model ought to behave. For starters, we’ll be building a Vintage::CPU that implements three registers and three flags, initializing them all to zero by default:

describe "CPU" do
  let(:cpu) { Vintage::CPU.new }

  let(:registers) { [:a, :x, :y] }
  let(:flags)     { [:c, :n, :z] }
  
  it "initializes registers and flags to zero" do
    (registers + flags).each { |e| cpu[e].must_equal(0) }
  end

   #...
end

It will be possible to directly set registers via the #[]= method, because the behavior will be the same for all three registers:

it "allows directly setting registers" do
  registers.each do |e|
    value  = rand(0xff)

    cpu[e] = value
    cpu[e].must_equal(value)
  end
end

However, because flags don’t have the same update semantics as registers, we will not allow directly setting them via #[]=:

it "does not allow directly setting flags" do
  flags.each do |e|
    value  = rand(0xff)

    err = -> { cpu[e] = value }.must_raise(ArgumentError)
    err.message.must_equal "#{e.inspect} is not a register"
  end
end

The carry flag (c) can toggled via the set_carry and clear_carry methods. We’ll need this later for getting the CPU into a clean state whenever we do addition and subtraction operations:

it "allows setting the c flag via set_carry and clear_carry" do
  cpu.set_carry
  expect_flags(:c => 1)

  cpu.clear_carry
  expect_flags(:c => 0)
end

Some other instructions will require us to set the carry flag based on arbitrary conditions, so we’ll need support for that as well:

it "allows conditionally setting the c flag via carry_if" do
  # true condition
  x = 3
  cpu.carry_if(x > 1)

  expect_flags(:c => 1)

  # false condition
  x = 0
  cpu.carry_if(x > 1)

  expect_flags(:c => 0)
end

The N and Z flags are set based on whatever result the CPU last processed:

it "sets z=1 when a result is zero, sets z=0 otherwise" do
  cpu.result(0)
  expect_flags(:z => 1)

  cpu.result(0xcc)
  expect_flags(:z => 0)
end

it "sets n=1 when result is 0x80 or higher, n=0 otherwise" do
  cpu.result(rand(0x80..0xff))
  expect_flags(:n => 1)

  cpu.result(rand(0x00..0x7f))
  expect_flags(:n => 0)
end

The result method also returns a number truncated to fit in a single byte, because pretty much every place we could store a number in this system expects 8-bit integers:

it "truncates results to fit in a single byte" do
  cpu.result(0x1337).must_equal(0x37)
end  

To help keep the CPU in a consistent state and to simplify the work involved in many of the 6502 instructions, we automatically call cpu.result whenever a register is set via CPU#[]=. The tests below show the the effects of that behavior:

  it "implicitly calls result() when registers are set" do
    registers.each do |e|
      cpu[e] = 0x100
      
      cpu[e].must_equal(0)
      expect_flags(:z => 1, :n => 0)

      cpu[e] -= 1
      
      cpu[e].must_equal(0xff)
      expect_flags(:z => 0, :n => 1)
    end
  end

Here’s an implementation that satisfies all of the tests we’ve seen so far:

module Vintage
  class CPU
    def initialize
      @registers = { :a => 0, :x => 0, :y => 0 }
      @flags     = { :z => 0, :c => 0, :n => 0 }
    end

    def [](key)
      @registers[key] || @flags.fetch(key)
    end

    def []=(key, value)
      unless @registers.key?(key)
        raise ArgumentError, "#{key.inspect} is not a register" 
      end

      @registers[key] = result(value)
    end

    def set_carry
      @flags[:c] = 1
    end

    def clear_carry
      @flags[:c] = 0
    end

    def carry_if(test)
      test ? set_carry : clear_carry
    end

    def result(number)
      number &= 0xff

      @flags[:z] = (number == 0 ? 1 : 0)
      @flags[:n] = number[7]

      number
    end
  end
end

Putting it all together, the role of the CPU class is mostly just to do some basic numerical housekeeping that will make implementing 6502 instructions easier. Consider for example, the CMP and BEQ operations, which can be used together to form a primitive sort of if statement. We saw these two operations used together in the earlier example of keyboard input handling:

$064f    c9 77     CMP #$77     # check if the key was "w" (ASCII code 0x77)
$0651    f0 0d     BEQ $0660    # if so, jump forward to $0660 

Using a combination of the CPU and Storage objects we’ve already built, we’d be able to define the CMP and BEQ operations as shown below:

CMP do 
  cpu.carry_if(cpu[:a] >= mem[e])

  cpu.result( cpu[:a] - mem[e] )
end

BEQ { mem.branch(cpu[:z] == 1, e) }

Even if we ignore the cpu.carry_if call, we know from what we’ve seen already that if CPU#result is called with a zero value, it will set the Z flag to 1. We also know that when Storage#branch is called with a true value, it will jump to the specified address, otherwise it will do nothing at all. Putting those two facts together with the Snake6502 shown above tells us that if the value in the A register is 0x77, execution will jump to $0600.

At this point, we’re starting to see how 6502 instructions can be mapped onto the objects we’ve already built, and that means we’re close to the finish line. Before we get there, we only have two obstacles to clear: implementing addressing modes to handle operands, and building a program runner that knows how to map raw 6502 code to the operation definitions shown above.

Addressing Modes

NOTE: The explanation that follows barely scrapes the surface of this topic. If you want to really understand 6502 addressing modes, you should check out the relevant section in the Easy6502 tutorial.

In the very first exercise where we disassembled the first few instructions of Snake6502, we discovered the presence of several addressing modes that cause operands to be interpreted in various different ways. To get the game running, we will need to handle a total of eight different addressing modes.

This is a lot of different ways to generate an address, and its intimidating to realize we’re only implementing an incomplete subset of what the 6502 processor provides. However, its important to keep in mind that the only data structure we have to work with is a simple mapping from 16-bit integers to 8-bit integers. Among other things, clever indexing can give us the functionality we’d expect from variables, references, and arrays – all the stuff that doesn’t have a direct representation in machine code.

I’m going to show the definitions for all of the addressing modes used by Snake6502 below, which probably won’t make much sense at first glance. But try to see if you can figure out what some of this code doing:

module Vintage
  module Operand
    def self.read(mem, mode, x, y)
      case mode
      when "#" # Implicit 
        nil
      when "@" # Relative
        offset = mem.next

        mem.pc + (offset <= 0x80 ? offset : -(0xff - offset + 1)) 
      when "IM" # Immediate
        mem.pc.tap { mem.next }
      when "ZP" # Zero Page
        mem.next
      when "ZX" # Zero Page, X
        mem.next + x
      when  "AB" # Absolute
        mem.int16([mem.next, mem.next])
      when "IX" # Indexed Indirect
        e = mem.next

        mem.int16([mem[e + x], mem[e + x + 1]])
      when "IY" # Indirect Indexed
        e = mem.next

        mem.int16([mem[e], mem[e+1]]) + y
      else
        raise NotImplementedError, mode.inspect
      end
    end
  end
end

Now let’s walk through them one-by-one. You can refer to the source code above as needed to make sense of the following examples.

1) The implicit addressing mode is meant for instructions that either don’t operate on a memory address at all, or can infer the address internally. An example we’ve already seen is the RTS operations that is used to return from a subroutine – it gets its data from the stack rather than from an operand, making it a single byte instruction.

2) The relative addressing mode is used by branches only. Consider the following example:

$0651    f0 0d     BEQ $0660    # if Z=1, jump to $0660 

By the time the $0d operand is read, the program counter will be set to $0653. If you add these two numbers together, you get the address to jump to if Z=1: $0660.

3) Immediate addressing is used when you want to have an instruction work on the operand itself. To do so, we return the operand’s address, then increment the program counter as normal. In the example below, the computed address (e) is 0x0650, and mem[e] == 0x77:

$064f    c9 77     CMP #$77

4) Zero page addressing is straightforward, it is simply refers to any address between $00 and $ff. These are convenient for storing program data in, and are faster to access because they do not require combining two bytes into a 16 bit integer. We’ve already seen copious use of this address mode throughout the examples in this article, particularly when working with keyboard input ($ff) and random number generation ($fe).

5) Zero page, X indexing is used for iterating over some simple sequences in memory. For example, Snake6502 stores the position of each part of the snakes body in byte pairs starting at memory location $10. Using this addressing mode, it is possible to walk over the array by simply incrementing the X register as you go.

6) We’ve also seen plenty of examples of absolute addressing, especially when looking at JSR operations. The only complication involved in processing these addresses is that two bytes need to be read and then assembled into a 16bit integer. But since we’ve had to do that in several places already, it should be easy enough to understand.

7) Indexed indirect addressing gives us a way to dynamically compute an address from other addresses that we’ve stored in memory. That sounds really confusing, but the following example should help clear it up. The code below is responsible for moving the snake by painting a white pixel at its updated head position, and painting a black pixel at its old tail position:

$0720    a2 00       LDX #$00
$0722    a9 01       LDA #$01
$0724    81 10       STA ($10,X) 
$0726    a6 03       LDX $03
$0728    a9 00       LDA #$00
$072a    81 10       STA ($10,X) 

The first three lines are hardcoded to look at memory locations $10 and $11 to form an address in the pixel array that refers to the new head of the snake. The next three lines do something similar for the tail of the snake, but with a twist: because the length of the snake is dynamic, it needs to be looked up from memory. This value is stored in memory location $03. So to unpack the whole thing, STA ($10, X) will take the address $10, add to it the number of bytes in the whole snake array, and then look up the address stored in the last position of that array. That address points to the snake’s tail in the pixel array, which ends up getting set to black by this instruction.

8) Indirect indexed addressing gives us yet another way to walk over multibyte structures. In nake6502, this addressing mode is only used for drawing the apple on the screen. Its position is stored in a 16-bit value stored in $00 and $01, and the following code is used to set its color to a random value:

$0719    a0 00       LDY #$00
$071b    a5 fe       LDA $fe
$071d    91 00       STA ($00),Y

There are bound to be more interesting uses of these addressing modes, but we we’ve certainly covered enough ground for now! Don’t worry if you didn’t understand this section that well, it took me many times reading the Easy6502 tutorial and the source code for Snake6502 before I figured these out myself.

6502 Simulator (finally!)

We are now finally at the point where all the hard stuff is done, and all that remains is to wire up the simulator itself. In other words, it’s time for the fun part of the project.

The input for the simulator will be a binary file containing the assembled program code for Snake6502. The bytes in that file not meant to be read as printable characters, but they can be inspected using a hex editor:

$ hexdump examples/snake.rom
0000000 20 06 06 20 38 06 20 0d 06 20 2a 06 60 a9 02 85
0000010 02 a9 04 85 03 a9 11 85 10 a9 10 85 12 a9 0f 85
0000020 14 a9 04 85 11 85 13 85 15 60 a5 fe 85 00 a5 fe
0000030 29 03 18 69 02 85 01 60 20 4d 06 20 8d 06 20 c3
0000040 06 20 19 07 20 20 07 20 2d 07 4c 38 06 a5 ff c9
0000050 77 f0 0d c9 64 f0 14 c9 73 f0 1b c9 61 f0 22 60
0000060 a9 04 24 02 d0 26 a9 01 85 02 60 a9 08 24 02 d0
0000070 1b a9 02 85 02 60 a9 01 24 02 d0 10 a9 04 85 02
0000080 60 a9 02 24 02 d0 05 a9 08 85 02 60 60 20 94 06
0000090 20 a8 06 60 a5 00 c5 10 d0 0d a5 01 c5 11 d0 07
00000a0 e6 03 e6 03 20 2a 06 60 a2 02 b5 10 c5 10 d0 06
00000b0 b5 11 c5 11 f0 09 e8 e8 e4 03 f0 06 4c aa 06 4c
00000c0 35 07 60 a6 03 ca 8a b5 10 95 12 ca 10 f9 a5 02
00000d0 4a b0 09 4a b0 19 4a b0 1f 4a b0 2f a5 10 38 e9
00000e0 20 85 10 90 01 60 c6 11 a9 01 c5 11 f0 28 60 e6
00000f0 10 a9 1f 24 10 f0 1f 60 a5 10 18 69 20 85 10 b0
0000100 01 60 e6 11 a9 06 c5 11 f0 0c 60 c6 10 a5 10 29
0000110 1f c9 1f f0 01 60 4c 35 07 a0 00 a5 fe 91 00 60
0000120 a2 00 a9 01 81 10 a6 03 a9 00 81 10 60 a2 00 ea
0000130 ea ca d0 fb 60
0000135

The challenge that is left to be completed is to process the opcodes and operands in this file and turn them into a running program. To do that, we will make use of a CSV file that lists the operation name and addressing mode for each opcode found in file:

00,BRK,#
10,BPL,@
18,CLC,#
20,JSR,AB
# ... rest of instructions go here ...
E6,INC,ZP
E8,INX,#
E9,SBC,IM
F0,BEQ,@

Once we know the addressing mode for a given operation, we can read its operand and turn it into an address (denoted by e). And once we have that, we can execute the commands that are defined in following DSL:

# NOTE: This file contains definitions for every instruction used 
# by Snake6502. Most of the functionality here is a direct result
# of simple calls to Vintage::Storage and Vintage::CPU instances.

NOP { }
BRK { raise StopIteration }


LDA { cpu[:a] = mem[e] }
LDX { cpu[:x] = mem[e] }
LDY { cpu[:y] = mem[e] }

TXA { cpu[:a] = cpu[:x] }

STA { mem[e] = cpu[:a] }

## Counters

INX { cpu[:x] += 1 }
DEX { cpu[:x] -= 1 }

DEC { mem[e] = cpu.result(mem[e] - 1) }
INC { mem[e] = cpu.result(mem[e] + 1) } 

## Flow control

JMP { mem.jump(e) }

JSR { mem.jsr(e) }
RTS { mem.rts }

BNE { mem.branch(cpu[:z] == 0, e) }
BEQ { mem.branch(cpu[:z] == 1, e) }
BPL { mem.branch(cpu[:n] == 0, e) }
BCS { mem.branch(cpu[:c] == 1, e) }
BCC { mem.branch(cpu[:c] == 0, e) }

## Comparisons

CPX do 
  cpu.carry_if(cpu[:x] >= mem[e])

  cpu.result(cpu[:x] - mem[e]) 
end

CMP do 
  cpu.carry_if(cpu[:a] >= mem[e])

  cpu.result(cpu[:a] - mem[e]) 
end


## Bitwise operations

AND { cpu[:a] &= mem[e] }
BIT { cpu.result(cpu[:a] & mem[e]) }

LSR do
  t = (cpu[:a] >> 1) & 0x7F
 
  cpu.carry_if(cpu[:a][0] == 1)
  cpu[:a] = t
end

## Arithmetic

SEC { cpu.set_carry   }
CLC { cpu.clear_carry }

ADC do 
  t = cpu[:a] + mem[e] + cpu[:c]

  cpu.carry_if(t > 0xff)
  cpu[:a] = t
end

SBC do
  t  = cpu[:a] - mem[e] - (cpu[:c] == 0 ? 1 : 0)

  cpu.carry_if(t >= 0)
  cpu[:a] = t
end

We can treat both the opcode lookup CSV and the instructions definitions DSL as configuration files, to be loaded into the configuration object shown below:

require "csv"

module Vintage
  class Config
    CONFIG_DIR = "#{File.dirname(__FILE__)}/../../config"

    def initialize(name)
      load_codes(name)
      load_definitions(name)
    end

    attr_reader :definitions, :codes

    private

    def load_codes(name)
      csv_data = CSV.read("#{CONFIG_DIR}/#{name}.csv")
                    .map { |r| [r[0].to_i(16), [r[1].to_sym, r[2]]] }

      @codes = Hash[csv_data]
    end

    def load_definitions(name)
      @definitions = {}

      instance_eval(File.read("#{CONFIG_DIR}/#{name}.rb"))
    end

    def method_missing(id, *a, &b)
      return super unless id == id.upcase

      @definitions[id] = b
    end
  end
end

Then finally, we can tie everything together with a Simulator object that instantiates all the objects we need, and kicks off a program execution loop:

module Vintage
  class Simulator
    EvaluationContext = Struct.new(:mem, :cpu, :e)
      
    def self.run(file, ui)
      config = Vintage::Config.new
      cpu    = Vintage::CPU.new
      mem    = Vintage::Storage.new

      mem.extend(MemoryMap)
      mem.ui = ui
      
      mem.load(File.binread(file).bytes)

      loop do
        code = mem.next

        op, mode = config.codes[code]
        if name
          e = Operand.read(mem, mode, cpu[:x], cpu[:y])

          EvaluationContext.new(mem, cpu, e)
                           .instance_exec(&config.definitions[op])
        else
          raise LoadError, "No operation matches code: #{'%.2x' % code}"
        end
      end
    end
  end
end

At this point, you’re ready to play Snake! Or if you’ve been following closely along with this article all the way to the end, you’re probably more likely to have a cup of coffee or take a nap from information overload. Either way, congratulations for making it all the way through this long and winding issue of Practicing Ruby!

Further Reading

This article and the Vintage simulator is built on top of a ton of other people’s ideas and learning resources. Here are some of the works I referred to while researching this topic: