Recording New Voices

To record new voices for the clock, you’ll need a computer with sound input capability (or a USB microphone) and audio software that can edit and save WAV files (such as the free, cross-platform Audacity).

My telephone answering machine speaks timestamps in awkward, stilted English. Maybe you have some talking gadget like this.

Though using high quality voice samples, they were recorded and are played back with no regard to the intonation we as humans apply to the same words when used in different parts of sentences. Probably a cost consideration, to use the smallest ROM possible. But we have a whole SD card at our disposal and can cut loose!

For example, when you or I say “It's 12:12 pm,” the first and second “twelve” have a slightly different inflection…and you’d say “20” differently than the 20 at the start of “21.” To make our spoken time slightly less awkward-sounding, a few repetitious bits of speech are recorded, and the sketch reassembles these with some simple rules.

About half the audio you record will be discarded, but reading this complete script helps capture a more believable inflection for each word — don’t edit down, read each sentence in full, with a pause in-between. Don’t run words together…you’ll need to “Shatnerize” a bit, with a full stop between each word, but otherwise try to keep the same pitch as you would when speaking normally. And avoid the tendency to be “sing-song” with pairs of lines (where pitch alternates up and down on contrasting words); state each sentence as a standalone thing.

For consistency in tone and volume, read the full script in one pass, then edit later. Don’t record, edit and save as you go. The words in bold are kept. The rightmost column lists the corresponding filenames (don’t speak these) that should be assigned to each bold word. For example, read the Shatnerized sentence “It's one o’clock am.” “It’s” is just there to help with the hour inflection; discarded later. The next three words are later copied into new files: h01.wav, m00.wav and am.wav. Trim any silence from the start and end of each word; there's a small gap during playback anyway, as each file is accessed.

Phrase

Filename(s)

Hello” (or other startup sound)

boot

The time is…” (or other announcement message)

annc

It’s

one

o’clock

am

h01, m00, am

It’s

two

ten

am

h02, m10

It’s

three

twenty

am

h03, m20

It’s

four

thirty

am

h04, m30

It’s

five

forty

am

h05, m40

It’s

six

fifty

am

h06, m50

It’s

seven

oh

one

am

h07, m0x, m1

It’s

eight

twenty

two

am

h08, m2x, m2

It’s

nine

thirty

three

am

h09, m3x, m3

It’s

ten

forty

four

am

h10, m4x, m4

It’s

eleven

fifty

five

am

h11, m5x, m5

It’s

twelve

oh

six

am

h12, m6

It’s

one

twenty

seven

am

m7

It’s

two

thirty

eight

am

m8

It’s

three

forty

nine

am

m9

It’s

four

eleven

pm

m11, pm

It’s

five

twelve

pm

m12

It’s

six

thirteen

pm

m13

It’s

seven

fourteen

pm

m14

It’s

eight

fifteen

pm

m15

It’s

nine

sixteen

pm

m16

It’s

ten

seventeen

pm

m17

It’s

eleven

eighteen

pm

m18

It’s

twelve

nineteen

pm

m19

I recorded the full session at a high bitrate (44.1 KHz 32-bit float) and cleaned up the sound a little (normalize, etc.) before downsampling to a more manageable 22 KHz 16-bit PCM…this is more than sufficient for voice. Then the essential words were clipped out into their own files…

Sound files should be copied to the root folder of the SD card. To minimize delays between words, start with a freshly-formatted card, copy the WAV files and eject (card access gets progressively slower as the filesystem becomes fragmented).

Last updated on 2015-05-04 at 04.27.56 PM Published on 2014-08-26 at 03.49.34 PM