Example/Demo Code History and Explanation
This project uses an example written by Me-No-Dev for Espressif Systems named CameraWebServer.ino. The example was modified by the author of this guide for Adafruit Industries to reduce the flash size overhead (we removed the web server functionality, isolated the face detection/recognition calls, brought this overhead into main.cpp and ra_filter.h), added compatibility for the Adafruit MEMENTO development board (added camera compatibility and added "blitting" the camera's raw image to the MEMENTO's TFT instead of to a webpage), and build an interactive robotics demo around it.
So, since this is a larger codebase than a typical learn project and we only modified the code, this page won't explain everything that CameraWebServer does. It will explain the important and modifiable code segments within main.cpp as they pertain to this project.
Capturing and Detecting a Photo
The loop function's first call is to performFaceDetection(), a function that captures a frame from the camera and performs face detection on it.
Within performFaceDetection(), a photo (aka, a "frame") is captured from the camera and stored into a buffer (fb = esp_camera_fb_get()). Then, two-stages of inference are run on this frame to attempt detecting a face.
Within the first stage, inference on a model (s1) to detect objects is performed. The s1.infer() function call performs inference on the image, stored in the framebuffer (fb). If any objects are detected, they're stored in a list, candidates.
std::list<dl::detect::result_t> &candidates = s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3});
The second line runs inference on a different model, s2. This inference attempts to differentiate unique faces from the generic objects detected in s1. Then, the faces are stored in a list, results, which is returned back to the loop() function.
std::list<dl::detect::result_t> performFaceDetection() {
// Capture a frame from the camera into the frame buffer
fb = esp_camera_fb_get();
if (!fb) {
Serial.printf("ERROR: Camera capture failed\n");
return std::list<dl::detect::result_t>();
}
// Perform face detection
std::list<dl::detect::result_t> candidates =
s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3});
std::list<dl::detect::result_t> results = s2.infer(
(uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}, candidates);
return results;
}
if (detectionResults.size() > 0) {
// Fill NeoPixel ring with a blue color while tracking
pixels.fill(pixels.Color(0, 0, 255), 0, pixels.numPixels());
pixels.show();
....
If a face was not detected in the past 3 seconds (the value of DELAY_SERVO_CENTER, in seconds), the robot "resets" itself by returning the head servo to its center position, adjusting the bounding box location to point to the center of the frame, and turning of fthe blue NeoPixel ring.
} else {
// We aren't tracking a face anymore
isTrackingFace = false;
// No face has been detected for DELAY_SERVO_CENTER seconds, re-center the
// servo
if ((millis() - prvDetectTime) > DELAY_SERVO_CENTER) {
Serial.println("Lost track of face, moving servo to center position!");
curServoPos = SERVO_CENTER;
headServo.write(curServoPos);
// Reset the previous detection time
prvDetectTime = millis();
// Re-center the bounding box at the middle of the TFT
prv_face_box_x_midpoint = 120;
// Clear the NeoPixels while we're not tracking a face
pixels.clear();
pixels.show();
}
if ((!isTrackingFace) && ((millis() - prvDetectTime) > DELAY_DETECTION)) {
Serial.println("Face Detected!\nTracking new face...");
isTrackingFace = true;
}
/// Write to TFT
tft.setCursor(0, 230);
tft.setTextColor(ST77XX_GREEN);
tft.print("TRACKING FACE");
// Draw face detection boxes and landmarks on the framebuffer
fb_data_t rfb;
rfb.width = fb->width;
rfb.height = fb->height;
rfb.data = fb->buf;
rfb.bytes_per_pixel = 2;
rfb.format = FB_RGB565;
draw_face_boxes(&rfb, &detectionResults);
...
Within draw_face_boxes(), a bounding box is drawn around the face to illustrate to the user what the code "sees".
// draw a bounding box around the face
tft.drawFastHLine(x, y, w, color);
tft.drawFastHLine(x, y + h - 1, w, color);
tft.drawFastVLine(x, y, h, color);
tft.drawFastVLine(x + w - 1, y, h, color);
Serial.printf("Bounding box width: %d px\n", w);
Serial.printf("Bounding box height: %d px\n", h);
The center/midpoint of the bounding box is then calculated. The code uses this new center point to compare against the previous center point, as a way to measure how much the face moved (and in what direction on the X-axis).
// Calculate the current bounding box's x-midpoint so we can compare it // against the previous midpoint cur_face_box_x_midpoint = x + (w / 2); // Draw a circle at the midpoint of the bounding box tft.fillCircle(cur_face_box_x_midpoint, y + (h / 2), 5, ST77XX_BLUE);
At the end of the loop(), the 240x240px frame buffer is drawn to the TFT display. Since the next iteration of loop() requires the use of the frame buffer to take a photo, we release it using esp_camera_fb_return().
// Blit out the framebuffer to the TFT
uint8_t temp;
for (uint32_t i = 0; i < fb->len; i += 2) {
temp = fb->buf[i + 0];
fb->buf[i + 0] = fb->buf[i + 1];
fb->buf[i + 1] = temp;
}
pyCameraFb->setFB((uint16_t *)fb->buf);
tft.drawRGBBitmap(0, 0, (uint16_t *)pyCameraFb->getBuffer(), 240, 240);
// Release the framebuffer
esp_camera_fb_return(fb);
Tracking the Face's Movement and Moving the Robot's Head
The trackFace() function handles moving the robot's head servo. Specifically, it handles calculations to answer the following questions:
1) Should we move the servo at all?
2) How much to move the servo by
3) In what direction should we move the servo?
The trackFace() function first checks if the bounding box has moved by comparing the current bounding box's midpoint position on the x-axis against the previous bounding box's x-axis midpoint value.
void trackFace() {
// Check if the bounding box has moved and if this is the first frame with a
// face detected, just save the coordinates
if ((cur_face_box_x_midpoint != prv_face_box_x_midpoint) &&
(prv_face_box_x_midpoint != 0)) {
...
If the face has moved, the code calculates the difference between both bounding box midpoints, in pixels.
Serial.printf("x_midpoint (curr. face): %d px\n",
cur_face_box_x_midpoint);
Serial.printf("x_midpoint (prv. face): %d px\n",
prv_face_box_x_midpoint);
// Calculate the difference between the new bounding box midpoint and the
// previous bounding box midpoint, in pixels
int mp_diff_pixels = abs(cur_face_box_x_midpoint - prv_face_box_x_midpoint);
Serial.printf("Difference between midpoints: %d px\n", mp_diff_pixels);
If the servo moved whenever a difference between the midpoints was detected, it'd be very "jittery". To smooth its motion, the code only moves the servo if it's past an adjustable SERVO_HYSTERESIS value of 2 pixels.
Then, the proportional amount of degrees the servo should move is calculated by multiplying the difference between the midpoints, in pixels, with a SERVO_MOVEMENT_FACTOR value. The resulting servoStepAmount degrees is more accurate than moving the servo a fixed amount of steps and results in smoother motion.
We've also implemented the concept of "dead zones" to smooth the servo's motion even further. These dead_zones are calculated as +/-10px from the center of the bounding box. If the bounding box's x-midpoint value is past the deadzone on either the right or the left, the servo's postion is incremented (moving left) or decremented (moving left).
// Only move the servo if the magnitude of the difference between the
// midpoints is greater than SERVO_HYSTERESIS
// NOTE: This smooths the servo's motion to avoid the servo from
// "jittering".
if (mp_diff_pixels > SERVO_HYSTERESIS) {
// Calculate how many steps, in degrees, the servo should move
int servoStepAmount = mp_diff_pixels * SERVO_MOVEMENT_FACTOR;
// Move the servo to the left or right, depending where x_midpoint is
// located relative to the dead zones
if (cur_face_box_x_midpoint < deadzoneStart)
curServoPos += servoStepAmount;
else if (cur_face_box_x_midpoint > deadzoneEnd)
curServoPos -= servoStepAmount;
}
The absolute curServoPos value is calculated (so we don't step in a negative direction, past the physical limitation of the servo) and is written to the servo.
// Move the servo to the new position
if (curServoPos != prvServoPos) {
curServoPos = abs(curServoPos);
Serial.printf("Moving servo to new position: %d degrees\n", curServoPos);
headServo.write(curServoPos);
}
Finally, the values of prv_face_box_x_midpoint and prvServoPos are updated to reflect the current position of the servo and the frame.
} // Update the previous midpoint coordinates with the new coordinates prv_face_box_x_midpoint = cur_face_box_x_midpoint; // Save the current servo position for the next comparison prvServoPos = curServoPos; }
Going Further - Tuning and Tweaking
This guide's code was tested in the guide author's office. The ML model was not trained on the guide author and it was not created by us. So, the accuracy of your robot's face detection may vary due to a large amount of factors such as room lighting, distance from the camera, camera's field of view in your environment, etc. You may find that modifying and fine-tuning the SERVO_HYSTERESIS and SERVO_MOVEMENT_FACTOR values results in smoother overall motion.
Page last edited March 08, 2024
Text editor powered by tinymce.