As a kid in the 1980s, there was something of an urban legend around the Speak & Spell that went by at least two variations:
- That there was some magical button combination that would unlock dirty words.
- Or — if you kept typing dirty words on a Speak & Spell, that it would scold you.
Though nobody ever actually heard a Speak & Spell do these things firsthand, such rumors persisted even to the present day*. But…understanding a bit how the TMS5100 speech chip works…this myth is totally busted.
* Much like the fabled “chemical that turns purple if someone pees in the pool,” also nonsense.
Most speech synthesizers of the day were based on phonemes — the smallest pieces of speech — which could be pieced together to form words and sentences. This gave them essentially an unlimited vocabulary, but the downside is a robotic monotone voice. Some tools could vary the speed or pitch somewhat, but at best these sounded like the Muppets’ Swedish Chef.
The standout feature of these T.I. speech chips was that they instead used linear predictive coding, a highly compressed lossy audio format1. Being based on actual recorded speech, a person’s unique timbre, inflection and even accents are possible (Speak & Spell toys released in a few other countries were voiced by native speakers…the American English item is distinct from British English, for example2). But this also means they can’t say anything willy-nilly…if it’s not recorded and stored in the ROM, it’s outside the chip’s vocabulary3. With folks having picked through every byte of the Speak & Spell ROM with a fine-toothed comb…we now know every word and phrase that’s in there, and that there are no swear words, nor any scolding4.
- And I do mean lossy. About 300 recorded words and phrases — a few minutes’ worth — fit in the Speak & Spell’s tiny 32 kilobyte ROM. The format — called LPC-10 — was part of a Federal standard for voice communications with limited bandwidth. A later variant of linear predictive coding is used in GSM cell phones…if you’ve ever heard someone getting out of range and their voice “sounds like a badly-compressed JPEG looks,” that’s a form of LPC struggling with fewer and fewer bits.
- There are tales of certain words in different Speak & Spell models where the original voice talent was not around to record changes…so you’ll get these one-off words that were instead spoken by an engineer…or in some cases hand-editing LPC tables through trial-and-error.
- Not entirely true. The TI99-4/A Terminal Emulator II module used a set of LPC-encoded phonemes coupled with a text-to-speech algorithm. But this is not how the Speak & Spell operated.
- There is, however, one orphaned word in the Speak & Spell ROM: “mosquito.” The LPC-encoded speech is present, but no spelling, nor is it linked to in any of the spelling word lists. Most likely it was decided that this was outside the target demographics’ vocabulary…but I like to imagine that someone took the task of “debugging” too literally. The word is included in the TalkieTrellis sketch.
What makes the original Speak & Spell voice so distinctly Speak & Spell is that it’s one actual specific person’s voice — Dallas TX radio announcer Mitch Carr for the U.S. model — not a piecing-together of synthetic phonemes. It then picks up an additional thick “technological accent” through the heavy processing of LPC encoding, storage and reconstruction.
T.I. had one (perhaps a few) special machine(s) that they would cart to these recording sessions and could perform the encoding and playback on-site. Probably long since dismantled.
The Talkie library for Arduino was written by Peter Knight, using insights and data from the MAME emulator (credit to authors and helpers within the “tms” files here).
Talkie originally emulated the TMS5220 speech chip, whereas Speak & Spell used its earlier sibling, the TMS5100. At first I thought it would be easiest to “rearrange” the TMS5100 speech data into the TMS5220 format and use that with Talkie. This is somewhat possible, but the result would incur noticeable shifts…it would not be a faithful reproduction, which was the whole point of this exercise. Going back to the MAME source code, it turns out the synthesis math is identical, and it’s mostly a matter of different coefficient tables between the two chips. I brought the TMS5100 tables over from MAME back into Talkie, and with just some small changes it’s now selectable between the two. (But it was a very roundabout journey getting there.)
Insights into the Speak & Spell ROM format — to extract all the words and phrases — came from furrtek.org. Their project was aiming to add new words to a Speak & Spell, but the info there was super helpful in getting the old data out.
Encoding new data into Talkie (or a real Speak & Spell) turns out to be quite a challenge. Not computationally — we have more power than we know what to do with sometimes — but that there’s no longer any readily-usable software to perform the LPC-10 compression with the same sort of results that Texas Instruments achieved. That code is just gone. The nearest thing that’s known is called QBoxPro — an ancient 16-bit Windows application. A link to software, and a description of how to use it on modern systems, are also present on the furrtek.org site.