BlueRetro Memory Card Emulation

TL;DR Link to heading

I added support in BlueRetro for the SEGA Dreamcast’s memory card, so users no longer need a real controller and VMU to save games. I also built a version of the BlueRetro hardware by hand.

The BlueRetro is a device that lets you connect Bluetooth controllers to a wide array of retro game consoles. At its heart is an ESP32 microcontroller, which supports Bluetooth and WiFi. Interface boards for the different consoles can be connected to the BlueRetro, and there are console-specific versions that integrate the ESP32 and interface boards into one.

The BlueRetro supports the SEGA Dreamcast. One interesting quirk of the Dreamcast is that the ports for certain user peripherals like memory cards were located not on the console itself (as with the Gamecube and Playstation 2), but on the controller.
From the perspective of a BlueRetro user, this should be great news, since the BlueRetro might be able to emulate more than just the controller. Sadly, it did not support emulating the memory card, so you’d need a real controller with a memory card plugged into one of the console’s controller ports in order to save games.

Building a BlueRetro Link to heading

I first got introduced to BlueRetro while working on getting an ESP32 to bit-bang the Maple protocol, which the Dreamcast uses to talk to the controller and controller-connected peripherals. The lead developer, Jacques Gagnon, had written some posts about disabling FreeRTOS on one of the ESP32’s CPU cores for low-latency, timing-sensitive applications such as bit-banging. It’s quite a coincidence that he was interested in this topic for basically the same reason as myself: bit-banging game console controller protocols.

Once I had finished with my VMU reader/writer, I decided to try to build a BlueRetro for the Dreamcast as a quick hobby project: there were some multiplayer games that I wanted to play with friends, and why would I spend money on more Dreamcast controllers when I already had what was needed to build a BlueRetro?

BlueRetro has two hardware specifications, HW1 and HW2. As the HW1 is significantly simpler to wire up, I went with that, at the cost of automatic port and console detection. I first built the device on a breadboard by cross-referencing the instructions for the Dreamcast adapter cable and the BlueRetro HW1. Since I only had one controller plug, I had to tie a few pins to ground or 3.3V. After a bit of trial and error, I had a nest of wires that kind-of worked.

It generally wouldn’t boot properly if only powered by the Dreamcast’s controller port, but powering it by USB was a convenient enuough workaround. Later, I tried rewiring it with a different ESP32 board and it worked fairly consistently, so I got around to building a slightly more permanent version:

Figuring out the current state of things Link to heading

The Memory Card backend Link to heading

While the BlueRetro didn’t support the Dreamcast Memory card, it does support the Nintendo 64’s controller pak. Jacques describes how it works in his devlog.
128KB of RAM is dynamically allocated in 4KB blocks, which is enough for 4 N64 controller paks, or a single Dreamcast VMU. This is used as a cache for a file stored on the flash’s filesystem. If no writes have occured for 1 second, the blocks are individually written back to the filesystem, so as to avoid blocking the CPU long enough to cause problems. All of this complexity is handled on the backend. Once a memory card is loaded into RAM, you can write to the memory card with mc_write() and read from it using mc_read().

The Maple protocol and the VMU Link to heading

The Dreamcast uses a proprietary serial protocol called Maple. For those interested, here’s an early description by Marcus Comstedt, and a more recent description on the Dreamcast wiki. Both of these were extremely useful when I was implementing a bit-banged version of the protocol in an earlier project. BlueRetro had already implemented the entire wire protocol and packet format.
The Maple Bus sends data in packets, as shown in the image . Each packet is divided into a frame word which contains the payload’s length in words, the sender and recipient addresses, and the command; a command-specific payload of 0-255 words; and a CRC of one byte for error detection.

(Structure of a packet and Frame word, from dreamcast.wiki)

Each word is 32 bytes. Each bit in a byte is sent most-significant bit first, but the bytes are sent in little-endian order (LSB-first). Some sources use the little-endian ‘wire order’ while others, such as Marcus’, use the big-endian “logical order,” and this can get rather confusing. The BlueRetro represents packets a packet to be sent or received with the maple_pkt struct. Within it, data is stored in ’logical’ order with the data32[] array.

The 8-bit address consists of, in order of most- to least-significant bit: 2 bits to represent the player/port number, 1 bit to set if addressing a main peripheral (such as a controller or keyboard), and 5 bits to set if addressing the one of the sub-peripherals (such as the VMU or the Jump Pack). If none of the peripheral or sub-peripheral bits of the address are set, the address refers to the “host” (the Dreamcast). The addressing system itself provides some information about which devices are on the bus: when a peripheral sends a response to a packet sent by the host, it sets bits in the source address of the packet to indicate which sub-peripherals (1-5) are connected to it.

The Dreamcast will send a Device Info Request (0x01) command to connected peripherals. The peripheral will then send a Device Info (0x05) packet containing information about the peripheral itself. In addition to using the information in the payload of the peripheral’s response packet, the Dreamcast also uses information in the source address field of the response header to determine which sub-peripherals to send further Device Info Requests.

Many payloads make use of a “function code” at the beginning. This can be seen as a kind of extra addressing system on top of the existing one, for addressing different “functions” of a single (sub-)peripheral. For example, the VMU supports the Storage, Screen, and Timer functions. A Block Write command with the Storage function code set will write data to the flash; the same command with the Screen function set will draw to the screen instead.

When a (sub-)peripheral responds to a Device Info Request, it will set up to 3 bits in the first word of the payload to indicate which functions it supports. The next 3 words will contain “function definitions” which provide extra information to the host about the specific parameters of the function. As an example, for a sub-peripheral to indicate it supports the “storage” function, it will set bit 1 in the function code mask. It will then provide parameters that the host will use to read/write to it in its function definition, which is laid out as follows:

After the Dreamcast receives valid Device Info from the VMU, it knows that a memory card is connected. At some point, it may send a Get Memory Information command (0x0A), and the memory card will send back information such as its total size, the block numbers and sizes of the different areas, etc.

With these commands, the Dreamcast has everything it needs to know to start reading and writing data from the memory card. On the host side, there’s more complexity to be had to read the filesystem, detect saves, etc., but the memory card just needs to read and write flash with the Block Read (0xB) and Block Write (0xC) commands.

Dmitry Grinberg’s reverse-engineering of the VMU provides descriptions of a few extra commands not mentioned on dreamcast.wiki, primarily the “Complete Write” (0x0D) command which is sent after 4 writes. The dreamcast.wiki describes this as a “Get Last Error” command. In any case, games send it and might expect a response.

Implementing the Maple storage function. Link to heading

So, to get memory cards working on the Dreamcast, we need to, in order:

Tell the Dreamcast that the main peripheral has a sub-peripheral through the addressing system.
Let the Dreamcast know that the sub-peripheral is a memory card. When the host sends a Device Info command (0x01) to the sub-peripheral, OR in the function code for the “Storage” function in the supported function codes mask (word 0) and send the corresponding function definition in one of the next bytes (words 1-3).
Respond to Memory Info Request (0x0A), with information indicating a standard VMU or memory card layout. At this point, the Dreamcast should detect that a (possibly corrupted) memory card is inserted.
Respond to other commands. Block Read (0x0B), Block Write (0x0C), and Write Complete (0x0D).

For testing, I mainly used the Dreamcast’s built-in system menu, which launches when no game is inserted, and Sonic Adventure 2 (“SA2”). The system menu has submenus for managing and formatting the VMU, and I expect it to be implemented more to-spec than games might be. When no memory card is inserted, the system menu will show no memory card inserted, and SA2 will show a message reading “Memory card not ready. The game cannot be saved,” and after the opening cutscene shows the message “Not enough free blocks available to save a game file. The game cannot be saved. 18 blocks required to save a game file.”

Making the memory card detectable Link to heading

Implementing the first step mostly follows from its description. The BlueRetro has configuration variables set through a web interface and stored in non-volatile memory. If the the packet is addressed to the the main peripheral, the “memory card” option is enabled, and the device to be emulated is the right type (so not a keyboard/mouse), the BlueRetro’s response will set bit 0 in the source address to indicate a sub-peripheral is connected in the first slot.

The second step also follows from the description. When the first sub-peripheral is sent a Device Info Request command, it will send back a Device Info packet (command code 0x05). Much of this payload must be hard-coded, and the rest can be hard-coded, and doesn’t seem to vary much between individual VMUs, so I used the response one of my VMUs sent when I queried it with my ESP32 VMU reader/writer. Initially, I included the function codes and function definitions for the Screen and Timer in addition to Storage, to make it appear as similar to a VMU as possible.

The third step also involves sending hard-coded data. When it receives a Memory Info Request, the BlueRetro will respond with predefined information about its memory layout.

I implemented these steps all at once. At this point, the BlueRetro responds to read/write commands as if they were issued to the Jump Pack sub-peripheral, leading to junk replies. The system menu shows a VMU, and presumably attempts to use read commands to get filesystem information. The option to format the memory card is there, but after the format is attempted it shows a message indicating failure. SA2 starts up with the same error message that it shows with no memory card, but the second message is replaced with a memory card selection menu. Since the memory card doesn’t respond to read/write commands with valid data yet, the VMU isn’t selectable. But at least it’s being detected.

Getting the memory card working Link to heading

Now that I know the memory card is at least detectable, I need to implement the Block Read and Block Write commands. The addressing of a read/write command (the “location word”) is fairly straightforward, but depends on the values sent in the function definition section of the Memory Info packet.
The memory card is subdivided into partitions and blocks. All documentation I’ve seen implies that devices usually have only one partition, and I set the virtual memory card to only report one. For simplicity when parsing, the byte representing partition number is ignored. The block size on all normal VMUs is 512 bytes. The Block Read and Block Write commands have an extra parameter, phase, which is determined by the “number of read accesses per block” and “number of write accesses per block” in the function definition. I defined these to be 1 and 4 respectively, in keeping with the VMU convention. I’ve heard that many games expect this exact layout, so it might not be safe to change. This means that for reading, the Dreamcast should send one read request (of phase 0) for 512 bytes, and for writing, it should send 4 write requests (with phases 0-3) of 128 bytes.

With reading and writing implemented, the memory cards should work just fine. Testing quickly showed that was not the case. I knew that there was a “write complete” command that I hadn’t yet implemented a response to, so I wrote that up, but in testing the Dreamcast behaved identically to the prior version.

Eventually I spotted a major bug that had likely been messing up my testing this entire time: the maple_tx function took a uint8_t for the len parameter. Each maple frame can be up to 255 4-byte words, plus 5 bytes for the frame header and CRC. But len is the size of the packet in bytes. This means that any packet we sent that was over 255 bytes would be completely broken. The BlueRetro never needed to send Maple frames this large before, but the Block Read command asks for 512 bytes at a time, and so expects a 525-byte response. The buffer that stores a Maple frame is already large enough to store this, so the only change that I needed to make was to the function declaration.

This made a huge difference: instead of the VMU showing up as invalid, it shows up as a device with zero blocks free. Still not functional, but progress is progress.

The final problem I needed to fix turned out to be very stupid: when parsing a packet to determine the phase and block, I was using the wire protocol’s little-endian order instead of the logical big-endian order that the pkt.data32 array exposes. With that fixed, the memory card emulation worked perfectly.

There was one last loose end to tie up before I could consider this feature truly complete. With memory card emulation enabled, the BlueRetro would frequently receive a barrage of Block Write commands with phases outside of the expected 0-3. These turned out to be write requests to the Screen function. Initially, I set the BlueRetro to report that it supports the Screen and Timer functions in order to make it seem as similar to a real VMU as possible, but actually supporting these functions would take a lot of additional effort for very little benefit. I removed the relevant function definitions and unset the relevant bits in the Device Info and tested it out with several games. Some games would use a different icon for the device (a memory card with no screen instead of a VMU), but they all still worked. The downside of not implementing the screen function is that there is likely a small percentage of games which require the use of the VMU screen for correct gameplay. These unfortunately won’t work with the BlueRetro.

Next steps: Link to heading

There are a few things that could be implemented. The foremost of those being:

Multiple Memory Cards Link to heading

The non-volatile storage (NVS) has more than enough free space for more VMUs. The larger problem is the amount of free RAM. The current storage architecture has a 128KB cache in RAM (exactly the size of one Dreamcast memory card). The ESP32 only has 320KB of Data RAM total, so allocating a second cache for an entire memory card isn’t going to happen. I poked around a little bit with multi-card support by repurposing the 128KB RAM cache as a cache for individual blocks from different virtual memory cards. When the Dreamcast requests a block that isn’t in the RAM cache, it will fetch it from NVS. The worst-case scenario for this in terms of latency is when a fetch must occur but there are no blocks which haven’t been written back to NVS. If a block is evicted without being written back, the data just written to that block will be lost. So a writeback must occur before the new data is fetched. This becomes quite a problem since the filesystem backend (SPIFFs) is incredibly unreliable in terms of latency. Quite often, a writeback to the filesystem will fail. With the existing code, this isn’t a huge problem: if a writeback fails, just try writing it back again in a second. However, in the worst-case our writebacks become time-sensitive and the data is not guaranteed to get to the Dreamcast in a reasonable amount of time.

Some lines of inquiry:

Is it possible to stall the Dreamcast after it issues a read/write command, so that there is time to writeback+fetch a block? If so, for how long? Documentation of the Maple protocol shows that it supports several errors, such as “File Error” and “Request Resend.” If it’s not possible to stall, we’ll need to either guarantee the BlueRetro can reply in time, or give up.
What kind of latency is the Dreamcast willing to accept? Accessing the RAM cache is very fast, but any solution that doesn’t cache every possible block in RAM will need to access the much slower NVS in the worst-case. If the acceptable latency is shorter than a writeback-plus-fetch, we may need to stall the Dreamcast in some situations.
How long does it take the BlueRetro to write back and fetch blocks?
If stalling the Dreamcast is not possible, and the BlueRetro can’t reply quickly enough,it may still be possible to support a subset of games provided we know their access patterns – for example, if a game always requests the system area, FAT area, or file information area for a specific memory card before randomly accessing that memory card, we can store those areas for all memory cards in RAM at all times, and use those initial accesses as a cue to load an entire memory card into the cache, guaranteeing no cache misses. This would be very hacky, and it’s not incredibly likely that many (if any) games access memory cards in this fashion.

VMU Screen Emulation Link to heading

The BlueRetro supports configuration over Bluetooth via a web interface. However, the configuration and controller emulation don’t happen simultaneously. Changing this would require major refactoring for very little benefit. I consider this a non-starter.

Dummy controller support Link to heading

The BlueRetro usually powers on at the same time as the Dreamcast, and it takes some time to sync to a Bluetooth controller. During this time, it reports that no controller (and therefore no storage device) is connected. This causes problems because some games check for a storage device at startup, and will not check later. It’s possible to work around this by powering on the console with the lid open, waiting until a controller is connected, and launching the game, but it would be better to not need to do this. However, if a controller disconnects from the BlueRetro, it should report to the console that this is so. The hacky solution to this is probably just to report a controller is connected at device startup until a controller is actually connected; any disconnections after that would be treated the same as a real controller disconnect.

Improved filesystem Link to heading

The BlueRetro uses SPIFFS to store configuration and memory card files. However, SPIFFS is not the most consistent or performant. For supporting multiple memory cards, interacting with the filesystem needs to be as quick and consistent as possible. A newer filesystem, LittleFS, promises more consistent writes and quicker read/write times, with the same set of methods, so it might be easy to swap out. If it’s quick and consistent enough, it may make implementing multiple memory cards significantly easier. I was worried that all of these improved metrics would come at the cost of a larger codebase, but this article actually shows a static code size reduction from 36KB to 24KB when compared to SPIFFS. CPU usage is also lower, and it implies dynamic RAM usage may be lower than SPIFFs when “a small number of files” are opened. Overall, this looks promising to explore.

Slight detour: reducing IRAM usage Link to heading

BlueRetro’s codebase is dangerously close to using up all available IRAM (instruction RAM), which stores the code that runs on core 1. I went over the IRAM limit when exploring the possibility of emulating multiple memory cards. When looking around for anything that might free up a bit of space so I could get it to compile, I found a macro, wait_100ns, that was defined as 11 nops, and then used 84 times. Each nop is either 3 or 2 bytes, so conservatively these copy-pastes were using about 1.8KiB of pure nops. If I could create a macro that caused an equivalent delay with less space used, I could free up a lot of space for code.
Looking at the LX6 ISA, I found a mention of a “zero overhead loop”, which was a single instruction that set the number of loop iterations, start, and end of a loop. After some testing to see how long the original 11 nops took, I found a value for the number of loop iterations that most closely matched the original method, tested it, and it worked like a charm. This allowed me to reduce the macro from 11 instructions to 3: a movi (to set the number of loop iterations), a loop instruction, and a nop. The total savings would be close to 8 instructions/invocation, and with 84 invocations, that’s 672 instructions. Sometimes, wait_100ns was called multiple times in a row. I created separate macros for these cases (the code size of the macro was the same, I just needed to change the number of iterations in the loop instruction), and replaced multiple consecutive invocations of wait_100ns with their 200ns and 500ns equivalents. This ended up saving another 689 bytes.
This is a rough estimate – the movi and nop instructions have 2-byte versions, and with an optimized assembler this likely reduces IRAM use by a third. The address of the loop-start must be aligned on the architecture’s fetch width, likely 4 bytes, so each invocation may use up to 3 bytes per invocation due to the assembler inserting a nop or extending instruction lengths. But in any case, it put me below the IRAM limit.

Conclusion Link to heading

This was my first time working with a codebase as large and complex as BlueRetro, and I’m glad I was able to understand it and make some improvements at the margins. Aside from its practical utility, BlueRetro is an example of a project that has amazing documentation and very maintainable code. Designing the memory card backend to be easily adaptable to the Dreamcast is really impressive forethought.