The goal of this project was to make a small-scale light painter using CircuitPython on a mid-range microcontroller board, with the level of image quality we achieved with the DotStar Pi Painternot blocky NES-like pixel graphics, but subtle smooth colors. This was, putting it lightly, “a challenge.” The CLUE board has just a small fraction of the Raspberry Pi’s speed or RAM…and then to do this all in CircuitPython, no C code as in the Pi project, it just seemed impossible.

I’d always viewed Python as “simple” and “maintainable,” but rarely “fast”… and early versions of the painter were indeed awful. Thankfully, lady ada, Scott Shawcroft and Jeff Epler — without judgment at my repeated failed attempts — all shared some of their “skateboard tricks” for improving parts of the Python code. In the end, it actually needed a slight delay added for best effect!

How it Works

The brightness range from addressable LED strips like DotStars and NeoPixels isn’t perceptually linear. Getting images to “look right” requires a reduction in the mid-range values. But…with only 256 possible values…this results in colors being grouped (“quantized”) into fewer “bins.” It’s all explained in this guide.

Dithering is a common workaround for this, using alternating just-slightly brighter and darker values in rapid succession to simulate intermediate tones. This works okay to a point, but the way it’s typically handled, with pixels being square (as on most screens), it still leaves a blocky pixel residue, a la early Macintosh graphics.

Interesting thing about light painting is that there isn’t the traditional two-dimensional fixed “X” and “Y” axes. The pixels along the strip are one axis, sure…but the perpendicular axis is time, and the resolution of that axis is a function of how quickly the light bar moves in physical space, as seen through the camera. Pixels need not be square.

The CLUE light painter, and the DotStar Pi Painter before it, rely on DotStar LEDs being really fast to update…the half-meter strip can refresh about 1,000 times a second. What these programs do then is stretch an image along the time axis. Each second equals about 1,000 rows, and the quantization and dithering is performed in that much higher-resolution space. But, moved slowly across the camera sensor, those 1,000 rows are squeezed back to the original image’s size (or close to it, is the goal), and much of that normally-lost detail is recovered as the dithered bits blend together. The software also interpolates between rows to further reduce pixelation effects. The resulting images don’t look like your typical digital light paintings at all…they’re super buttery!

led_strips_anamorphic-dither-diagram.jpg
Photo credit: Cary Bass, Wikimedia Commons CC BY-SA 3.0

As you can imagine, that involves a ton of math, and with only a few grams of microcontroller to work with, things went very poorly at first.

Skateboard Tricks

First insight came from Ladyada, who was adamant in using DotStar LEDs for this, not NeoPixels. A couple early attempts used NeoPixels and the images were really blocky. I figured, it’s a short strip, no big deal? But it is a big deal. Partly this has to do with the transfer rate of the LED strips: the speed at which they can receive data. With NeoPixel strips, this is always fixed at 800 kilobits/second. And the CPU is completely tied up during that transfer…everything else stops. DotStars can receive at ten times this rate, 8 MHz, and one perk of the CLUE’s nRF52840 processor is that we get SPI DMA “free” — that transfer doesn’t take time away from other Python duties. So…while NeoPixels are cheaper and easier to wire up, this is one task where DotStars really shine.

Math Shenanigans

Ladyada was also leaning on me to use the ulab library (pronounced “micro lab”), to showcase its inclusion in recent CircuitPython builds. ulab is a numerical processing tool, performing operations on whole lists and tables of numbers much more quickly than iterating through Python loops. Jeff, who was pivotal in getting ulab into CircuitPython, has written a guide on its use and assisted in this part of the light painter code.

ulab was the secret ingredient to that buttery interpolation and dithering! It seemed like the wrong tool for this at first, especially for the dithering…but really just required a new understanding of the problem, thinking about rows of pixels as a whole thing, not discrete elements. You can see this in the process() function in bmp2led.py. Even with all the comments it’s still kind of "weird code."

RAM Shenanigans

Automatic garbage collection is one of the blessings of a language like Python, allowing for quicker development. There’s no need to track every little object allocation. Unused objects are periodically cleaned up by the system when space is needed, behind the scenes.

In 99% of applications, this is automagic and unseen…for instance, a weather application might poll a website and update a screen a few times a day, and a momentary pause to clean up has no perceptible impact on this.

Light painting is in that peculiar 1% though, because it’s reliant on physical things. A camera’s shutter, the movement of a stick. Pausing for garbage collection in the middle of a photo will just wreck that photo, there’s no recourse. And the way I’d initially written this, it was garbage-collecting a lot. Scott and Jeff had insights here, that with a bit of forethought, sections of code can be written to never need temporary allocations or garbage collection. Light painting is esoteric, but the idea might have implications for other time-critical code, like games.

Here’s the seemingly innocuous, but actually disastrous, inner-most loop as originally written:

Download: file
while True:
    action_set = {self.button_left.action(),
                  self.button_right.action()}
    if RichButton.TAP in action_set:
        painting = not painting # Toggle paint mode on/off
    elif RichButton.HOLD in action_set:
        return # Exit painting, enter config mode
    # Code continues here...

Here it’s testing for “taps” and “holds” on both the left and right buttons, by putting both button action values into a set. This is cool in that it checks both buttons (or potentially any number) with a single Python “in” statement…no if-or-or-or construct is needed.

The fatal flaw is the action_set = line. This allocates a new set object, and any old value is set aside for later garbage collection. It’s a very tiny allocation, just a few bytes…but this loop runs about 1,000 times a second, and very quickly even the CLUE’s capacious memory was filled. Every single photograph was getting glitched as the garbage collector ran; there weren’t even any lucky shots.

The fix was small and simple: avoid the allocation inside the loop. Rather than a set, create a small known-size list before getting into the loop. Once in the loop, set existing elements of the list. Requires one extra line of code, but all that allocation and cleanup goes away, and the timing became super uniform and glitch-free:

Download: file
action_list = [None, None]
while True:
    action_list[0] = self.button_left.action()
    action_list[1] = self.button_right.action()
    if RichButton.TAP in action_list:
        painting = not painting # Toggle paint mode on/off
    elif RichButton.HOLD in action_list:
        break # End paint loop
    # Code continues here...

Little things like that. And we still get to use the cool “in” syntax. The two distinct assignments would probably get pooh-poohed as “not Pythonic,” but oh well, here we are.

File Shenanigans

A related problem existed in the code that was reading the BMP images, and then later when reading the temporary (processed) file to feed that data to the DotStar LEDs. The normal file.read() function allocates and returns a buffer each time it’s called:

Download: file
while (condition):
    led_data = file.read(length_in_bytes)
    spi.write(led_data)

It’s quicker and easier to write this way (the read and write could even be expressed on a single line), but results in a fresh led_data being allocated on every pass of the loop, leading to frequent garbage collection.

Fix is similar to the button situation: move the allocation outside the loop, then use file.readinto() to keep re-using the same memory.

Download: file
led_data = bytearray(length_in_bytes)
while (condition):
    file.readinto(led_data)
    spi.write(led_data)

Boom! Massive speedup. This isn’t a solution to every problem, but it works in this case because the size of the led_data buffer is consistent from call to call…we have that luxury then of only allocating it once.

A different and unrelated problem involved the LED temporary file, where an image is “stretched” and converted to a DotStar-ready format. It turns out that appending to a file can get really, really slow if you do it a lot.

Download: file
with open(output_filename, 'wb') as led_file:
    while (condition):
        led_file.write(output_buffer)

It was doing this on every single row…on an image that might get stretched to 1,000 rows or more!

Like the readinto() fix, the workaround here exploits the fact that the resulting file will be a known fixed size, something we can calculate ahead of time. We can then use file.seek() and write a single byte at the end to very quickly create a huge file full of nothing…then go back to the start and fill in the data, much faster now because it’s not appending:

Download: file
with open(output_filename, 'wb') as led_file:
    led_file.seek((output_buffer_size * rows) - 1)
    led_file.write(b'\0')
    led_file.seek(0)
    while (condition):
        led_file.write(output_buffer)

This change alone doubled the speed of the image conversion!

This guide was first published on May 06, 2020. It was last updated on May 06, 2020.
This page (CircuitPython Magic) was last updated on Oct 12, 2020.