Example/Demo Code History and Explanation
This project uses an example written by Me-No-Dev for Espressif Systems named CameraWebServer.ino. The example was modified by the author of this guide for Adafruit Industries to reduce the flash size overhead (we removed the web server functionality, isolated the face detection/recognition calls, brought this overhead into main.cpp and ra_filter.h), add compatibility for the Adafruit MEMENTO development board (added camera compatibility and added "blitting" the camera's raw image to the MEMENTO's TFT instead of to a webpage), and build an interactive demo around it.
So, since this is a larger codebase than a typical learn project and we only modified the code, this page won't explain everything that CameraWebServer does. It will explain the important and modifiable code segments within main.cpp.
Capturing a photo
Most applications for a digital camera like the MEMENTO require the camera to save photos in a compressed file format (like JPEG) to save space on the SD card and a large resolution, we found the facial detection code runs fastest with a smaller frame size (240x240px) and the RGB565 raw bitmap.
Within main.cpp, the initCamera()
function handles initializing the MEMENTO's camera. This code segment configures the camera's frame size to 240x240px and its pixel format to RGB565.
config.grab_mode = CAMERA_GRAB_WHEN_EMPTY; config.fb_location = CAMERA_FB_IN_PSRAM; config.frame_size = FRAMESIZE_240X240; config.pixel_format = PIXFORMAT_RGB565; config.fb_count = 2;
Within the loop()
, we don't need to perform conversion from RGB565 to another format or resolution. The code tells the camera to take a picture and then stores it in a frame buffer.
// capture from the camera into the frame buffer Serial.printf("Capturing frame...\n"); fb = esp_camera_fb_get(); if (!fb) { Serial.printf("ERROR: Camera capture failed\n"); } else { Serial.printf("Frame capture successful!\n"); ... }
Face Detection
After a frame (photo) is successfully captured, the code performs face detection. In this example, the code runs two stages of inference.
In the first stage, inference on a model (s1
) to detect objects is performed. The s1.infer()
function call performs inference on the image, stored in the framebuffer (fb
). If any objects are detected, they're stored in candidates
.
Serial.printf("Frame capture successful!\n"); // Face detection std::list<dl::detect::result_t> &candidates = s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3});
The second line runs another inference on a different model, s2
. This call uses the objects detected in the first stage to then attempt to detect faces. Faces are stored in the results
list.
std::list<dl::detect::result_t> &results = s2.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}, candidates);
When a face is detected, the list of results will be non-zero. The code prints that a face has been detected. The face detection boxes and landmarks are drawn to the TFT in the draw_face_boxes
function.
if (results.size() > 0) { Serial.println("Detected face!"); ... // Draw face detection boxes and landmarks on the framebuffer draw_face_boxes(&rfb, &results, face_id); ... }
Finally, the 240x240px frame buffer is drawn to the TFT display. Since the next iteration of loop()
requires the use of the frame buffer, fb
, to take a photo, we release it.
// Blit framebuffer to TFT uint8_t temp; for (uint32_t i = 0; i < fb->len; i += 2) { temp = fb->buf[i + 0]; fb->buf[i + 0] = fb->buf[i + 1]; fb->buf[i + 1] = temp; } pyCameraFb->setFB((uint16_t *)fb->buf); tft.drawRGBBitmap(0, 0, (uint16_t *)pyCameraFb->getBuffer(), 240, 240); // Release the framebuffer esp_camera_fb_return(fb);
Face Recognition
To recognize a face, the code takes a picture and performs face detection (explained above) on the frame.
... Serial.printf("Frame capture successful!\n"); // Face detection std::list<dl::detect::result_t> &candidates = s1.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}); std::list<dl::detect::result_t> &results = s2.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}, candidates); if (results.size() > 0) { Serial.println("Detected face!"); ...
A structure holding the frame buffer data, rfb
, is created and data is copied from the frame buffer (fb
) to the new structure.
int face_id = 0; fb_data_t rfb; rfb.width = fb->width; rfb.height = fb->height; rfb.data = fb->buf; rfb.bytes_per_pixel = 2; rfb.format = FB_RGB565; ...
Since face recognition is a slow operation to perform. The code only attempts it if it detected a face and is enrolling the face, or if a face was previously enrolled. The run_face_recognition
method is called with the copy of frame buffer data and the results
from the second inference function.
if (recognizer.get_enrolled_id_num() > 0 || is_enrolling) { face_id = run_face_recognition(&rfb, &results); } ...
Within the run_face_recognition
function, a vector of landmarks (discussed on the "Usage" page of this guide) is created from the inference results. Then, a tensor
multi-dimensional array is created and shaped to fit the framebuffer's data.
std::vector<int> landmarks = results->front().keypoint; int id = -1; (void)id; Tensor<uint8_t> tensor; tensor.set_element((uint8_t *)fb->data) .set_shape({fb->height, fb->width, 3}) .set_auto_free(false);
Enrolling a New Face
The function obtains the amount of faces which are currently enrolled. Then, it verifies if the MEMENTO is in enroll-mode and if the number of faces enrolled is less than the maximum.
If that condition is true, the code proceeds to enroll a new face and assign it a face identifier number. The serial and TFT print the new enrolled identifier and disables the enroll mode (is_enrolling = false
).
int enrolled_count = recognizer.get_enrolled_id_num(); if (enrolled_count < FACE_ID_SAVE_NUMBER && is_enrolling) { int id = recognizer.enroll_id(tensor, landmarks, "", true); Serial.printf("Enrolled ID: %d", id); tft.setCursor(0, 230); tft.setTextColor(ST77XX_CYAN); tft.print("Enrolled a new face with ID #"); tft.print(id); is_enrolling = false;
face_info_t recognize = recognizer.recognize(tensor, landmarks);
The recognize
similairity value weighs the image's assessed similarity value against a saved face's similairity value.
The similairity values range from 0.0
to 1.0
, where a value of 1.0
is a "confident positive match". In order to avoid a false positive (the code "recognizes" an invalid face), we added a confidence threshold. This threshold, FR_CONFIDENCE_THRESHOLD
, is used to detect how "confident" the match is by comparing it to the similiarity value.
If the code did did not have a confidence threshold, the code recognizes a face with much lower accuracy and possibly raises false positive or false negative matches.
In the code segment below, if the face is recognized (and its similarity value is larger than the FR_CONFIDENCE_THRESHOLD
value), the MEMENTO's display shows that a face has been recognized and the NeoPixel ring lights up green.
if (recognize.id >= 0 && recognize.similarity >= FR_CONFIDENCE_THRESHOLD) { // Face was recognized, print out to serial and TFT Serial.printf("Recognized ID: %d", recognize.id); Serial.printf("with similarity of: %0.2f", recognize.similarity); tft.setCursor(0, 220); tft.setTextColor(ST77XX_CYAN); tft.print("Recognized Face ID #"); tft.print(id); tft.print("\nSimilarity: "); tft.print(recognize.similarity); // Set pixel ring to green to indicate a recognized face for (int i = 0; i <= NUMPIXELS; i++) { pixels.setPixelColor(i, pixels.Color(0, 255, 0)); } pixels.show(); delay(2500); }
If the code did not recognize a face, but other faces are enrolled, the TFT prints "Intruder alert" and the NeoPixel ring shows a red color.
} else if (recognizer.get_enrolled_id_num() > 0) { // Face was not recognized but we have faces enrolled Serial.println("Intruder alert - face not recognized as an enrolled face!"); Serial.printf("This face has a similarity of: %0.2f\n", recognize.similarity); // Set pixel ring to green to indicate a recognized face for (int i = 0; i <= NUMPIXELS; i++) { pixels.setPixelColor(i, pixels.Color(255, 0, 0)); } pixels.show(); delay(1000); } ...
Adjust the Confidence Threshold
The confidence threshold, FR_CONFIDENCE_THRESHOLD
, determines how "confident" the face match is by comparing it to a similarity value.
If the confidence threshold value is too low, the code may incorrectly recognize a face. If the value is too high, the code may not recognize a face unless there is an exact match between the enrolled face and the image you just took.
To adjust this value in code, you will need to adjust the value of FR_CONFIDENCE_THRESHOLD
to a number between 0.0 and 1.0, where 1.0 is an exact match:
// Threshold (0.0 - 1.0) to determine whether the face detected is a positive // match NOTE - This value is adjustable, you may "tune" it for either a more // confident match #define FR_CONFIDENCE_THRESHOLD 0.7
Then, you will need to build and upload the code. This is discussed on the Modify Code using PlatformIO page of this guide.
Save Faces to Flash Memory
This code does not save the enrolled faces if the MEMENTO is rebooted. However, if you want to "lock" the code to save the enrolled faces between device reboots, you will need to set the following variable to true.
// True if you want to save faces to flash memory and load them on boot, False // otherwise #define SAVE_FACES_TO_FLASH false
Then, you will need to build and upload the code. This is discussed on the Modify Code using PlatformIO page of this guide.
Adjust the Number of Recognized Faces
The demo provided in this guide recognizes a maximum of four faces. If you want to recognize more faces, the following variable will need to be modified.
// The number of faces to save // NOTE - these faces are saved to the ESP32's flash memory and survive between // reboots #define FACE_ID_SAVE_NUMBER 4
Increasing the FACE_ID_SAVE_NUMBER
increases the amount of time required to recognize a new face. After adjusting this number, you will need to build and upload the code to the MEMENTO. This is discussed on the Modify Code using PlatformIO page of this guide.
Page last edited March 08, 2024
Text editor powered by tinymce.