Ray Casting

The ray casting part of this project’s code is adapted from a tutorial by Lode Vandevenne — and they go into much greater detail there if you’d like to learn more about it!

Despite appearances — and as explained in the introduction — ray casting is really a 2D algorithm that just happens to look 3D. From the observer’s position in a 2D map, a set of vectors or rays, one per column of the screen, is projected outward into the map. Where those rays first intersect a “solid” square of the map, and the distance to that intersection, determines the height of the wall for that one column. And which side of the square determine’s the wall’s color. Each column is then just drawn as three vertical lines: a section above the wall is the “sky,” then the wall itself, then a second below as “ground.” If very close to a wall, there might not be any visible sky or ground for that column. This operation is quickly repeated 128 times — once per column of the Hallowing’s display.

The combined result looks like a 3D labyrinth. And by changing the observer’s position and direction between frames, we’ve got “3D” animation. Sorcery!

Image from Wikipedia by LucasVB, public domain.

Reconfiguring the Display

Ray casting alone is a software technique. Now we pair this up with some hardware tricks to make the whole thing work…

Ray casting works column-by-column, but most bitmapped displays store data row-by-row, and compensating for this difference would normally require buffering two entire screens worth of data (one is being transferred from RAM to screen while the next frame in RAM is being calculated) — that’s 64 kilobytes…but we only have 32 kilobytes available in the Hallowing’s SAMD21 microcontroller.

An interesting property of many TFT displays (including the one used on Hallowing) is that the “mapping” of pixels can be changed. In most situations when redrawing the full screen, as each pixel’s color is sent over the SPI bus, the pixel positions increment from left to right across each row, and then rows proceed from top to bottom…it’s said to be row major,” and most graphic displays work this way…it’s a throwback to how CRT monitors worked.

With a small change, we can configure the display to work in column major order — successive pixels increment top to bottom along each column, with columns proceeding left to right. It’s a peculiar layout, combining rotation and mirroring operations, but it’s perfect for this project’s needs. Each column of graphics can be issued to the display immediately after being processed. No need to buffer a whole screen’s worth, let alone two!

Here’s the code that reconfigures the display. This only needs to be done once, after the display is initialized in the setup() function.

While the ST7735 library does use a similar technique to handle screen rotation in hardware (via the setRotation() function), this combination of rotation and mirroring is peculiar enough that we must work around the library and talk directly to the TFT driver’s device registers…

Download: file
  digitalWrite(TFT_CS, LOW);
  digitalWrite(TFT_DC, LOW);
#ifdef ST77XX_MADCTL
  SPI.transfer(ST77XX_MADCTL); // Current TFT lib
#else
  SPI.transfer(ST7735_MADCTL); // Older TFT lib
#endif
  digitalWrite(TFT_DC, HIGH);
  SPI.transfer(0x28);
  digitalWrite(TFT_CS, HIGH);

DMA Tricks

While the ray casting explanation above talks about “drawing lines,” if you dig through the code you won’t find even a single call to drawLine(). Aside from initializing the display hardware, we have to go around the Adafruit_ST7735 library and do things a lot faster…

Direct memory access (DMA) is a feature of the SAMD21 microcontroller that facilitates an explicit form of multitasking…for example, the chip can transfer a buffer full of data from RAM to the SPI bus (to the TFT display) without requiring the CPU’s intervention for every byte…it can go along calculating other things while the transfer proceeds in the background.

The process is explained a bit in the Hallowing Spirit Board tutorial, which also achieves smooth full-screen animation. Instead of working one drawing operation at a time, we push data out the SPI bus as fast as possible in one long stream. That code uses two buffers in RAM, each enough for one row’s worth of pixels…one line is being calculated while the prior line is concurrently issued over SPI using DMA, and we switch between them at the start of each new line.

The Minotaur Maze code takes this idea to the extreme. It doesn’t buffer even a single scanline in RAM. In fact, all of the “graphics data” as it were — the colors of the walls, sky and floor — occupy six bytes of flash memory!

DMA usually operates on contiguous blocks of memory. Copy this block of data from here to here, transfer this block from RAM to SPI, and so forth. But there’s a configuration bit in the DMA descriptor (the structure that defines a DMA transfer operation) indicating whether the source data pointer should be incremented after each byte. For example, if reading data from SPI, you want the source pointer (aimed at the SPI data register) to stay put. By using this when writing to SPI, the same byte will be transferred repeatedly…and we can use this to fill spans of pixels…our vertical columns of wall and so forth.

Download: file
const uint8_t color = 0x42; // One color byte is stored in RAM or flash

descriptor.SRCADDR.reg       = (uint32_t)&color;
descriptor.DSTADDR.reg       = (uint32_t)SPI.DATA.reg;
descriptor.BTCTRL.bit.SRCINC = 0;
descriptor.BTCTRL.bit.DSTINC = 0;
descriptor.BTCNT.reg         = num_pixels * 2;
...

The downside to this approach is that we don’t have full use of the 16-bit color space that the TFT display can provide. Each pixel expects 16 bits of color data, but because we’re issuing the same byte twice for each pixel…the high and low bytes must be the same…that limits our options to a fixed palette of 256 entries:

projects_dma_colors.png
Click for full-resolution image to better read the numbers

It’s a decent spread, but if you’re looking for just the right nuanced color you may not find it. Also, interactions between the color byte pairs mean you can’t get a 100% pure red, green or blue…green will always have a little blue mixed in, red will always have a little green mixed in, and so forth. Black (0x00 hexadecimal) and white (0xFF) are the only predictable colors.

So are textured walls possible?

My hunch is that the SAMD21 is fast enough to do this, at a slightly reduced (but still interactive) frame rate. Maybe I’ll revisit this code at some point. It would have to use some RAM buffering and we wouldn’t get use of the no-source-increment fill trick described above, which is really the thing I wanted to cover here.

Hallowing’s SAMD21 processor runs at 48 MHz, the SPI bus at 12 MHz, and each pixel requires 16 bits of data. Between the latter two, we can estimate a theoretical peak transfer rate of 750,000 pixels per second. And dividing the former, that allows about 64 processor clock cycles per pixel. Can we stretch a bitmap that quickly? Probably, or at least something in the ballpark. Some overhead is still required for the ray casting calculations and setting up DMA transfers…but enough left over that I suspect there’s still a lot of graphics surprises remaining to be squeezed out of this little board!

This guide was first published on Sep 17, 2018. It was last updated on Sep 17, 2018.
This page (How it Works) was last updated on Oct 08, 2020.