Because some significant timing and memory challenges were anticipated from the start, the structure of the improved thermal camera CircuitPython code was instrumented to measure its performance. Five functional areas were identified that would provide obvious hints as to where architectural or speed issues may live. Not only did the performance monitoring help solve some tricky timing and memory allocation issues, the resulting structure of the code allowed it to be easily adapted for testing on other development boards. The five code performance areas were:
Define Display Elements
The one-time display definitions for rectangles, labels, and on-screen status. This process uses large blocks of memory to build the displayio group of display element attributes.
Acquire Sensor Data
The first portion of the repeating primary loop that acquires and conditions the thermal sensor data. The acquisition process uses I2C input/output resources, the AMG88xx sensor library, and creates two large arrays in memory to hold and process the data. Floating point calculations constrain the sensor data to a valid temperature range.
Display Statistics
Updates the on-screen alarm, min, max, and average values. This process manipulates display element attributes in memory, requires floating point calculations to support displayio, utilizes SPI input/output resources, and uses ulab to quickly determine min, max, and average.
Normalize and Interpolate
Normalizes the 8 x 8 sensor data, copies it to the display grid array, and interpolates the values within the 15 x 15 display grid array. ulab is used for all calculations.
Display Image
Scans elements in the display grid array, calculates the iron spectrum color, and uses displayio to update an on-screen rectangle if the color has changed from the previous frame. After updating the image, this code segment checks the operational controls and modifies the display mode as selected. This segment heavily uses floating point, memory, and SPI input/output for the displayio functionality.
An elapsed time marker is stored at the beginning of each code segment. At the end of the primary process loop, the markers are analyzed and a report is printed to the serial output to be viewed via the REPL. Here's a screen shot of the performance report:
After improvements, the code was ported to run on 9 other development boards in the workshop inventory ranging from SAMD-51 (M4) boards to ESP32-S2, nRF52840, and the RP2040. All of the PyGamer code was left intact except where specific display or button interface requirements were needed. For example, since the PyPortal has no hardware buttons, its touch screen was used to implement button-like controls. Similarly, the Setup helper code was removed if memory capacity issues were identified for a particular development board.
The PyGamer platform performed the best in this comparison. Generally, development boards that use the M4 (SAMD-51) processor performed well, beating the 2 frames-per-second performance threshold.
Since the thermal camera code uses a unique combination of resources suited for displaying temperature images, the comparison of thermal camera performance on the different platforms should not be construed as revealing intractable flaws of a particular development board or processor architecture. Instead, the comparison helps to point out performance bottlenecks unique to the thermal camera application that could benefit from further code refinement.
Many factors from processor architecture to the board's TFT display bus could impact thermal camera performance. For example, the current version of the thermal camera depends heavily on floating point calculations for almost everything, from normalizing and constraining sensor data to the internal calculations of CircuitPython displayio functions when it positions objects and justifies on-screen text. Development boards that use the SAMD-51 (M4) have an integral floating point processor in hardware that makes calculations a breeze -- so much so that little attention is given to tuning the code to calculate with integers when floating point math really isn't needed. Development boards such as those with the RP2040 processor do not have integral hardware floating point. Can you see where this is going?
So this comparison wasn't a completely fair test. The thermal camera's code was written to work best with the M4 architecture, not to take advantage of the RP2040's faster clock speed and huge memory capacity (and low cost!). What would it take to modify the code to work better with the RP2040? Are CircuitPython displayio and AMG8833 libraries tuned to take advantage of the RP2040's talents? We're going to have to add that project to the list.
Page last edited March 08, 2024
Text editor powered by tinymce.