The old adage "a picture is worth a thousand words" asserts that a standalone image has the ability to express complex thoughts, views and feelings concurrently. In this project, you'll put that to the test by uploading the pictures you take with your MEMENTO camera to OpenAI to request a description of the image with various prompts.
The MEMENTO is running CircuitPython code that lets you connect to WiFi, take a photo, send the photo to OpenAI with their API and then save the response as a text file. You can view the response after it is fetched on the MEMENTO display.
Six prompts are included to get you started and they range from utilitarian to silly:
- Alt Text
- Haiku
- Translate Text to English
- Cable Identifier
- Describe the Image as an Alien Visitor
- Is Anything Out of Place?
You can use the directional buttons on the MEMENTO to select your prompt before taking the photo and sending it to OpenAI.
Project Inspiration
The inspiration for this project came from the Descriptive Camera project by Matt Richardson. It was built in 2012 with a BeagleBone that connected to the Amazon Mechanical Turk API. That API would outreach to people on the internet to complete tasks. In this case, it was asking those workers to create metadata manually for each photo submitted. Just over ten years later, and this process can now be automated and run without the need for a Linux OS.
Thoughts on "AI"
Artificial intelligence is one of the most divisive issues of our time. The approach and mindset of this project is that at its core, AI is a tool that is only as good as the human giving input to it. You'll see that the prompts used in this project are worded in a way to try and achieve a result that is both useful and concise; with the keyword being try. Mistakes on the part of the API are entirely possible and responses should always be checked.
However, in general, the responses have been found to be accurate and, as a result, exciting, especially when considering how some of these more utilitarian prompts could be engineered in the future for accessibility applications.
Text editor powered by tinymce.