NES Emulator

Aside from the work I’ve been doing in the past year on developing Aberrant, my latest notable project has been the development of a (rather crummy) NES emulator.

The code is on my sourcehut: https://git.sr.ht/~jos/nesemu.

The project is very much incomplete. It only runs a few games, and it doesn’t have a working audio driver. But I’ve learned quite a few interesting things from it.

This is the third emulator I’ve written, if you count the technically-not-an-emulator for Chip8. And it seems to me that, as a rule, the CPU instructions themselves are the easiest part to implement: You decode the instruction, execute it to match the documented behavior, and increment the PC accordingly.

What this simple explanation leaves out is the fact that in real hardware, time passes and it takes multiple clock cycles to execute an instruction. Not only that, but different instructions take different number of cycles to execute.

In my GameBoy emulator, the CPU’s step() function externalizes all of that time handling by simply returning how many cycles the single instruction step is supposed to take. It is then up to the caller to manage the timing.

For the NES CPU, I’ve tried to make execution more “cycle accurate” and have the CPU abstraction keep track of time itself. Instead of one logical “step”, the CPU “ticks” one cycle at time (as if with a crystal or oscillator circuit). Of course, the CPU still executes instructions “all-at-once” (i.e. it doesn’t split the actual computation of an instruction based on the tick), but it tracks each tick as a unit of time.

This approach also allows for a feature that the GameBoy does not: Direct Memory Access (DMA) from the CPU to the PPU. Since DMA effectively preempts the processor, without taking “tick” as an input, handling DMA would also need to be external.

Coroutines

I have found that in some situations, coroutines can help replace complex logic for asynchronous state machines with more simple, linear, imperative code.

Since the output of the PPU co-processor controls a CRT television, this process works in scanlines and includes several periods of operation that, though not relevant to our modern displays, does have observable properties that can be exploited by NES programs. The behavior can be described by various loops kind of like this:

for (;; ++frame_count) {
  for (scanline = -1; scanline < 0; ++scanline) {
    for (pixel = 0; pixel < 1; ++pixel) {
      // Idle Cycle
    }

    for (pixel = 1; pixel <= 256;) {
      // Drawing the pixels on a line
    }

    for (pixel = 257; pixel <= 320;) {
      // HBLANK period
    }

    for (pixel = 321; pixel <= 336;) {
      // HBLANK period. Pre-fetch.
    }

    for (pixel = 337; pixel <= 340;) {
      // HBLANK period. Dummy Reads.
    }
  }

  // ... etc
}

Instead of trying to keep track of “state” in order to know what to do on any given cycle, we can simply execute the procedure and insert a sort of “end of step” at every point.

co_await std::suspend_always{};
++pixel;

This is a little ugly, but I think it’s mostly as a result of me not yet understanding how compose functions and coroutines. That is, ideally, there would be a way to defer suspending the coroutine. But I haven’t figured out how to do it. Perhaps the best option is a preprocessor macro.

For example, reading from the nametable takes the PPU two cycles (for context, the V register in part tracks the current “position”):

auto nametable_addr = 0x2000 | (V & 0x0FFF);
auto next_tile_id = read_ppu_ram(nametable_addr);

consume_cycles(2);

// Instead of:
/*
co_await std::suspend_always{};
++pixel;
co_await std::suspend_always{};
++pixel;
*/

Testing

I try to do test-driven development when I can, but sometimes it really feels like it can get in the way of the cleanest and most straightforward version of the code. In this case, I mostly eschewed testing at that level in favor of running test ROMs, but eventually the lack of testing started to hinder me.

The burden of test-driven development comes from the need to introduce levels of indirection everywhere in order to isolate the code under test. However, this drawback is minor compared to the benefit of knowing that the system matches the strict requirements created as part of TDD.

Particularly for a system where most of the functional requirements are quite well understood, not using TDD was a mistake. I might eventually revisit this project and rebuild it using TDD.

NES Emulator

Coroutines

Testing

Copyright © 2022-2025 By Josaphat Valdivia