To record new voices for the clock, you’ll need a computer with sound input capability (or a USB microphone) and audio software that can edit and save WAV files (such as the free, cross-platform Audacity).
My telephone answering machine speaks timestamps in awkward, stilted English. Maybe you have some talking gadget like this.
Though using high quality voice samples, they were recorded and are played back with no regard to the intonation we as humans apply to the same words when used in different parts of sentences. Probably a cost consideration, to use the smallest ROM possible. But we have a whole SD card at our disposal and can cut loose!
For example, when you or I say “It's 12:12 pm,” the first and second “twelve” have a slightly different inflection…and you’d say “20” differently than the 20 at the start of “21.” To make our spoken time slightly less awkward-sounding, a few repetitious bits of speech are recorded, and the sketch reassembles these with some simple rules.
About half the audio you record will be discarded, but reading this complete script helps capture a more believable inflection for each word — don’t edit down, read each sentence in full, with a pause in-between. Don’t run words together…you’ll need to “Shatnerize” a bit, with a full stop between each word, but otherwise try to keep the same pitch as you would when speaking normally. And avoid the tendency to be “sing-song” with pairs of lines (where pitch alternates up and down on contrasting words); state each sentence as a standalone thing.
For consistency in tone and volume, read the full script in one pass, then edit later. Don’t record, edit and save as you go. The words in bold are kept. The rightmost column lists the corresponding filenames (don’t speak these) that should be assigned to each bold word. For example, read the Shatnerized sentence “It's one o’clock am.” “It’s” is just there to help with the hour inflection; discarded later. The next three words are later copied into new files: h01.wav, m00.wav and am.wav. Trim any silence from the start and end of each word; there's a small gap during playback anyway, as each file is accessed.
Phrase |
Filename(s) |
||||
“Hello” (or other startup sound) |
boot |
||||
“The time is…” (or other announcement message) |
annc |
||||
It’s |
one |
o’clock |
am |
h01, m00, am |
|
It’s |
two |
ten |
am |
h02, m10 |
|
It’s |
three |
twenty |
am |
h03, m20 |
|
It’s |
four |
thirty |
am |
h04, m30 |
|
It’s |
five |
forty |
am |
h05, m40 |
|
It’s |
six |
fifty |
am |
h06, m50 |
|
It’s |
seven |
oh |
one |
am |
h07, m0x, m1 |
It’s |
eight |
twenty |
two |
am |
h08, m2x, m2 |
It’s |
nine |
thirty |
three |
am |
h09, m3x, m3 |
It’s |
ten |
forty |
four |
am |
h10, m4x, m4 |
It’s |
eleven |
fifty |
five |
am |
h11, m5x, m5 |
It’s |
twelve |
oh |
six |
am |
h12, m6 |
It’s |
one |
twenty |
seven |
am |
m7 |
It’s |
two |
thirty |
eight |
am |
m8 |
It’s |
three |
forty |
nine |
am |
m9 |
It’s |
four |
eleven |
pm |
m11, pm |
|
It’s |
five |
twelve |
pm |
m12 |
|
It’s |
six |
thirteen |
pm |
m13 |
|
It’s |
seven |
fourteen |
pm |
m14 |
|
It’s |
eight |
fifteen |
pm |
m15 |
|
It’s |
nine |
sixteen |
pm |
m16 |
|
It’s |
ten |
seventeen |
pm |
m17 |
|
It’s |
eleven |
eighteen |
pm |
m18 |
|
It’s |
twelve |
nineteen |
pm |
m19 |
I recorded the full session at a high bitrate (44.1 KHz 32-bit float) and cleaned up the sound a little (normalize, etc.) before downsampling to a more manageable 22 KHz 16-bit PCM…this is more than sufficient for voice. Then the essential words were clipped out into their own files…
Sound files should be copied to the root folder of the SD card. To minimize delays between words, start with a freshly-formatted card, copy the WAV files and eject (card access gets progressively slower as the filesystem becomes fragmented).
Text editor powered by tinymce.