Since the release of ChatGPT in late 2022, the number of companies and groups creating and releasing large language models (LLM) and other generative artificial intelligence (AI) tools has grown rapidly. Many of these tools are made to run on high end GPUs or expensive specialized and power hungry equipment packed into server rooms across the world. However, there is also push to make smaller and more efficient models that can run on lower end hardware like smartphones and other small consumer devices.
The Raspberry Pi 4 & 5 have specs comparable to mid-range consumer smartphones and run on ARM, the same processor family that many phones use. This means they get to tag along and make use of many of the models that were built for phones and other consumer devices. Being able to run on a Pi lowers the barrier to entry for experimenting with LLMs and incorporating them into projects. It also makes it possible to use their functionality disconnected from the internet in remote environments with more tight power constraints. Being able to run locally and disconnected (sometimes called "The Edge") helps ease some of the privacy concerns associated with always connected services.
This guide will document the setup and usage of a handful of models that have been successfully tested on Raspberry Pi 4 and 5 with 8 GB RAM.
Page last edited September 10, 2025
Text editor powered by tinymce.