Example/Demo Code History and Explanation
This project uses an example written by Me-No-Dev for Espressif Systems named CameraWebServer.ino. The example was modified by the author of this guide for Adafruit Industries to reduce the flash size overhead (we removed the web server functionality, isolated the face detection/recognition calls, brought this overhead into main.cpp and ra_filter.h), added compatibility for the Adafruit MEMENTO development board (added camera compatibility and added "blitting" the camera's raw image to the MEMENTO's TFT instead of to a webpage), and build an interactive robotics demo around it.
So, since this is a larger codebase than a typical learn project and we only modified the code, this page won't explain everything that CameraWebServer does. It will explain the important and modifiable code segments within main.cpp as they pertain to this project.
Capturing and Detecting a Photo
The loop
function's first call is to performFaceDetection()
, a function that captures a frame from the camera and performs face detection on it.
Within performFaceDetection()
, a photo (aka, a "frame") is captured from the camera and stored into a buffer (fb = esp_camera_fb_get()
). Then, two-stages of inference are run on this frame to attempt detecting a face.
Within the first stage, inference on a model (s1
) to detect objects is performed. The s1.infer() function call performs inference on the image, stored in the framebuffer (fb). If any objects are detected, they're stored in a list, candidates
.
std::list<dl::detect::result_t> &candidates = s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3});
The second line runs inference on a different model, s2
. This inference attempts to differentiate unique faces from the generic objects detected in s1
. Then, the faces are stored in a list, results
, which is returned back to the loop()
function.
std::list<dl::detect::result_t> performFaceDetection() { // Capture a frame from the camera into the frame buffer fb = esp_camera_fb_get(); if (!fb) { Serial.printf("ERROR: Camera capture failed\n"); return std::list<dl::detect::result_t>(); } // Perform face detection std::list<dl::detect::result_t> candidates = s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}); std::list<dl::detect::result_t> results = s2.infer( (uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}, candidates); return results; }
if (detectionResults.size() > 0) { // Fill NeoPixel ring with a blue color while tracking pixels.fill(pixels.Color(0, 0, 255), 0, pixels.numPixels()); pixels.show(); ....
If a face was not detected in the past 3 seconds (the value of DELAY_SERVO_CENTER
, in seconds), the robot "resets" itself by returning the head servo to its center position, adjusting the bounding box location to point to the center of the frame, and turning of fthe blue NeoPixel ring.
} else { // We aren't tracking a face anymore isTrackingFace = false; // No face has been detected for DELAY_SERVO_CENTER seconds, re-center the // servo if ((millis() - prvDetectTime) > DELAY_SERVO_CENTER) { Serial.println("Lost track of face, moving servo to center position!"); curServoPos = SERVO_CENTER; headServo.write(curServoPos); // Reset the previous detection time prvDetectTime = millis(); // Re-center the bounding box at the middle of the TFT prv_face_box_x_midpoint = 120; // Clear the NeoPixels while we're not tracking a face pixels.clear(); pixels.show(); }
if ((!isTrackingFace) && ((millis() - prvDetectTime) > DELAY_DETECTION)) { Serial.println("Face Detected!\nTracking new face..."); isTrackingFace = true; }
/// Write to TFT tft.setCursor(0, 230); tft.setTextColor(ST77XX_GREEN); tft.print("TRACKING FACE"); // Draw face detection boxes and landmarks on the framebuffer fb_data_t rfb; rfb.width = fb->width; rfb.height = fb->height; rfb.data = fb->buf; rfb.bytes_per_pixel = 2; rfb.format = FB_RGB565; draw_face_boxes(&rfb, &detectionResults); ...
Within draw_face_boxes()
, a bounding box is drawn around the face to illustrate to the user what the code "sees".
// draw a bounding box around the face tft.drawFastHLine(x, y, w, color); tft.drawFastHLine(x, y + h - 1, w, color); tft.drawFastVLine(x, y, h, color); tft.drawFastVLine(x + w - 1, y, h, color); Serial.printf("Bounding box width: %d px\n", w); Serial.printf("Bounding box height: %d px\n", h);
The center/midpoint of the bounding box is then calculated. The code uses this new center point to compare against the previous center point, as a way to measure how much the face moved (and in what direction on the X-axis).
// Calculate the current bounding box's x-midpoint so we can compare it // against the previous midpoint cur_face_box_x_midpoint = x + (w / 2); // Draw a circle at the midpoint of the bounding box tft.fillCircle(cur_face_box_x_midpoint, y + (h / 2), 5, ST77XX_BLUE);
At the end of the loop()
, the 240x240px frame buffer is drawn to the TFT display. Since the next iteration of loop() requires the use of the frame buffer to take a photo, we release it using esp_camera_fb_return()
.
// Blit out the framebuffer to the TFT uint8_t temp; for (uint32_t i = 0; i < fb->len; i += 2) { temp = fb->buf[i + 0]; fb->buf[i + 0] = fb->buf[i + 1]; fb->buf[i + 1] = temp; } pyCameraFb->setFB((uint16_t *)fb->buf); tft.drawRGBBitmap(0, 0, (uint16_t *)pyCameraFb->getBuffer(), 240, 240); // Release the framebuffer esp_camera_fb_return(fb);
Tracking the Face's Movement and Moving the Robot's Head
The trackFace()
function handles moving the robot's head servo. Specifically, it handles calculations to answer the following questions:
1) Should we move the servo at all?
2) How much to move the servo by
3) In what direction should we move the servo?
The trackFace()
function first checks if the bounding box has moved by comparing the current bounding box's midpoint position on the x-axis against the previous bounding box's x-axis midpoint value.
void trackFace() { // Check if the bounding box has moved and if this is the first frame with a // face detected, just save the coordinates if ((cur_face_box_x_midpoint != prv_face_box_x_midpoint) && (prv_face_box_x_midpoint != 0)) { ...
If the face has moved, the code calculates the difference between both bounding box midpoints, in pixels.
Serial.printf("x_midpoint (curr. face): %d px\n", cur_face_box_x_midpoint); Serial.printf("x_midpoint (prv. face): %d px\n", prv_face_box_x_midpoint); // Calculate the difference between the new bounding box midpoint and the // previous bounding box midpoint, in pixels int mp_diff_pixels = abs(cur_face_box_x_midpoint - prv_face_box_x_midpoint); Serial.printf("Difference between midpoints: %d px\n", mp_diff_pixels);
If the servo moved whenever a difference between the midpoints was detected, it'd be very "jittery". To smooth its motion, the code only moves the servo if it's past an adjustable SERVO_HYSTERESIS
value of 2 pixels.
Then, the proportional amount of degrees the servo should move is calculated by multiplying the difference between the midpoints, in pixels, with a SERVO_MOVEMENT_FACTOR
value. The resulting servoStepAmount
degrees is more accurate than moving the servo a fixed amount of steps and results in smoother motion.
We've also implemented the concept of "dead zones" to smooth the servo's motion even further. These dead_zones
are calculated as +/-10px from the center of the bounding box. If the bounding box's x-midpoint value is past the deadzone on either the right or the left, the servo's postion is incremented (moving left) or decremented (moving left).
// Only move the servo if the magnitude of the difference between the // midpoints is greater than SERVO_HYSTERESIS // NOTE: This smooths the servo's motion to avoid the servo from // "jittering". if (mp_diff_pixels > SERVO_HYSTERESIS) { // Calculate how many steps, in degrees, the servo should move int servoStepAmount = mp_diff_pixels * SERVO_MOVEMENT_FACTOR; // Move the servo to the left or right, depending where x_midpoint is // located relative to the dead zones if (cur_face_box_x_midpoint < deadzoneStart) curServoPos += servoStepAmount; else if (cur_face_box_x_midpoint > deadzoneEnd) curServoPos -= servoStepAmount; }
The absolute curServoPos
value is calculated (so we don't step in a negative direction, past the physical limitation of the servo) and is written to the servo.
// Move the servo to the new position if (curServoPos != prvServoPos) { curServoPos = abs(curServoPos); Serial.printf("Moving servo to new position: %d degrees\n", curServoPos); headServo.write(curServoPos); }
Finally, the values of prv_face_box_x_midpoint
and prvServoPos
are updated to reflect the current position of the servo and the frame.
} // Update the previous midpoint coordinates with the new coordinates prv_face_box_x_midpoint = cur_face_box_x_midpoint; // Save the current servo position for the next comparison prvServoPos = curServoPos; }
Going Further - Tuning and Tweaking
This guide's code was tested in the guide author's office. The ML model was not trained on the guide author and it was not created by us. So, the accuracy of your robot's face detection may vary due to a large amount of factors such as room lighting, distance from the camera, camera's field of view in your environment, etc. You may find that modifying and fine-tuning the SERVO_HYSTERESIS
and SERVO_MOVEMENT_FACTOR
values results in smoother overall motion.
Text editor powered by tinymce.