Here we explain some of the geeky background theory stuff. If you just want to get into building the thing, you can skip ahead to the next page.

# Graaains…

Regardless whether you’re old enough to have played with Dad’s LP turntable, or have dabbled in digital audio programs on the newest modern PC, you’ve likely experienced some version of this phenomenon: take an audio recording that’s normally played back at one specific speed…and then change that speed, either compressing or expanding time…and the pitch of the audio changes along with it. Compress time and the pitch rises. Expand time and the pitch drops. Frequency is inversely proportional to wavelength.
That’s easy with recordings…but with live audio, we don’t really have that luxury. Realtime is realtime…we can’t compress or expand it…it’s happening as it happens. What’s a would-be voice-changer to do?
There’s a complex technique called a Fourier transform that converts a function (or, say, a stream of audio samples) into its frequency spectrum. The resulting frequency values can be altered and an inverse transform applied to turn this back into audio. This is all mathematically good and proper…but it’s a very demanding process and way beyond what our little Arduino can handle. A fairly potent CPU or DSP is usually required. We’ll need a shortcut or some hack…
In digital music circles, granular synthesis is a technique of joining and layering lots of very short audio samples (or “grains”) — on the order of one to a few milliseconds — to build up more complex sounds or instruments. Now picture just a single “grain,” 10 milliseconds or so…and we continually refresh this one grain from a live microphone. By time-compressing or -stretching this one tiny loop, repeating or dropping short segments to keep up with realtime, we have the basis for a realtime pitch shifter. It really seems like this shouldn’t work…but it does! Speech waveforms tend to repeat over the very short term, and we can drop or repeat some of those waves with only minor degradation in legibility.
This approach is totally suited to the Arduino’s limited processing power and RAM. The result isn’t going to be Hollywood quality, but it’s still vastly better than the majority of voice-changing toys and masks on store shelves. And you get to make it yourself…how cool is that?

# Sampling Audio

The frequency range of human voice covers about 300 Hz to 3,500 Hz (and harmonics may extend above this). The Nyquist sampling theorem states that a minimum 2X sample rate is needed to faithfully reconstruct a signal. For human voice, that means 7 KHz sampling…but a little more wouldn’t hurt.
Repeatedly calling the Arduino’s standard analogRead() function in a loop is way, WAY too slow for this. We need to get deeper into the works of the Arduino’s analog-to-digital converter, fiddling directly with special registers and modes. A capability called free-run mode collects analog samples at a fast, fixed interval without repeated polling in our code. An interrupt handler is automatically called each time a new sample is ready, which happens like clockwork. Running full tilt, a 16 MHz Arduino can capture 9,615 10-bit samples per second. More than enough for sampling voice!
The audio samples are stored in a circular buffer, which is really just big fancy computer science words for “when you reach the end of the buffer, roll back around to the beginning and write over it.” But conceptually, it helps to think of it as a literal circle:
The frequency of recorded sound will seldom match the buffer length exactly, and audio samples are stored and read at different rates. This can produce a sharp discontinuity — a popping noise — each time the “in” and “out” points cross. A small extra buffer is used to store some of the prior audio samples, and the code cross-fades the audio over this boundary to reduce the “pop.”
Because our audio “grain” is relatively short (about 10 milliseconds), the RAM requirements should be fairly modest, a few hundred bytes. Problem is, we’d also like to continue doing those things that the Wave Shield was designed for — namely, playing back WAV files. That requires reading files from an SD card, and that in turn consumes lots of RAM. Fortunately the design of the WAV-playing code lets us gain access that library’s memory and recycle it for our own needs.
The technical details are all well-commented in the source code. So if you’re curious about the specifics of this implementation…use the source, Luke!

# Limitations

When introducing new users to Arduino, I often describe it as “just enough computer to do any one thing really well.” Walking while chewing gum is a challenge. And so it goes with this project as well. Keep the following limitations in mind:

• It can process the voice effect or play back WAVs (and can do both within the same sketch), but you can’t do both simultaneously.
• You can’t read other analog inputs when the voice effect is running (case in point, you can’t alter the pitch continually with a potentiometer). If using analog sensors as sound triggers (e.g. force-sensing resistor pads in shoes), consider work-arounds such as using a carefully-trimmed voltage divider to a digital input, or a second MCU to process analog inputs, forwarding triggers over a serial or I2C connection.
• Although this can change the pitch of one’s voice, it can’t change timbre. It won’t, for instance, make things more metallic or robotic-sounding.

This guide was first published on Oct 10, 2012. It was last updated on Oct 10, 2012.

This page (Principles of Operation) was last updated on May 22, 2022.