Download example files
So there I was, with 56 hour-long interviews recorded on my hard drive, urgently needing conversion to text so I could finish writing my book. My first thought was to try speech-recognition software, but one of the leading companies shot down that idea right away. It’s hard enough to train a computer to recognize your own voice, they explained, let alone decipher another person’s—particularly when it’s been savaged by the U.S. telephone system. Some writers get around that challenge by repeating the interviewee’s answers into the computer, but I figured that by the time I’d done that, I could have typed them anyway.
That’s when I hit on the idea of converting my Mac into a virtual Dictaphone, the kind of remote-controlled tape deck that professional transcribers use. Typically, these systems (which cost hundreds of dollars) play analog cassette tapes under foot pedal control. When you tap the left pedal, the tape will rewind slightly and then resume playing. That’s much more convenient than taking your hands off the keyboard and hitting Stop, Rewind, and Play; and the rewind length is consistent. Holding the pedal will usually rewind the tape further, and the right-hand pedal will stop playback, fast-forward the tape, or perform other functions. Newer Dictaphone-style machines use digital signal processing to maintain the speaker’s pitch while playing back faster or slower.
Of course, I’m not the first person to think of transforming a computer into a Dictaphone; there are countless transcription programs out there. NCH Express Scribe (free for Mac and Windows) even comes with instructions for building your own foot pedals. But by designing my own system, I was able to optimize it for the way I wanted to work. Furthermore, adding new features was easy; the technique works on movie files as well as audio, and I learned some cool tricks that I subsequently used in other projects. Here’s how I did it.
If you haven’t bought QuickTime Pro ($29.95), you’ll be hounded by a window that urges you to upgrade. Often, this "nag screen" opens behind other windows, preventing the QuickTime Player from starting until you dredge up the sales pitch and dismiss it with a click. The nag screen seems to come up the first time you open the QuickTime Player in each 24-hour period.
Some people say that substituting the
launchcommand for the
when opening QuickTime Player from an AppleScript will prevent the nag screen
from opening. That method hasn’t worked for me, but here’s one that has:
After starting your Mac, temporarily set the system clock to 20 years
or so in the future, and then launch the QuickTime Player. When the nag screen
appears, click "Later." Now set your clock back to the correct
date. The QuickTime salesman should then leave you alone for two decades.
I eventually upgraded anyway, because QuickTime Pro provides scads of useful
editing and exporting features.
The system I devised has three components: the player, the control commands, and the triggers. To play back the audio files, I used Apple’s QuickTime Player, which is included with every Mac. I controlled it with a collection of AppleScripts, which are like miniature programs you can write in a simple language. To trigger the AppleScripts, I used the function keys in Tex-Edit Plus, a brilliant shareware word processor. In fact, I wrote most of my book in that $15 program, using Microsoft Word only to add headers and footers to the final text I sent to the publisher. Tex-Edit is a streamlined program that’s almost infinitely extensible through AppleScript. One enthusiast site offers hundreds of free scripts that do everything from colorize HTML tags to convert the selected text to pig Latin.
If you’d rather control your virtual Dictaphone from another application, try an app launcher such as One Key, iKey, Keyboard Maestro, or DragThing. (Some launchers may require you to save your AppleScripts as stand-alone applications.)
Figure 1 shows how easy it is to script the QuickTime Player. I opened
Apple’s free Script Editor program, typed the text in the bottom pane, and
clicked the Check Syntax button. Script Editor then added formatting to
show the roles of the various words: language keywords in bold red, application
keywords in blue, comments in gray italics (preceding text with
two hyphens makes it a comment), and values in plain text. Pressing
the Run button causes the front-most QuickTime movie or audio file to play back
at 1.4 times its normal
speed, producing a Munchkin effect, because the pitch is shifted up as well.
What if there’s no QuickTime file open? The
try block that brackets
set command prevents the Mac from throwing an error message.
After I got more experienced with AppleScript, I added code that prompts the
user to open a file if one is not yet open.
Note that the file name in Figure 1 ends with two underscore characters and "F10." When you store the AppleScript in Tex-Edit’s Scripts folder, that syntax assigns the script to a function key. A single hyphen maps a script to the F10 key, two hyphens map it to Command-F10, and three hyphens map it to Command-Shift-F10.
To emulate the backspace pedal on a Dictaphone, I sent the QuickTime
Player a step backward movie command, as shown in Figure
2. Pressing the associated function key will rewind the file
by the amount set in the
rewindtime variable (a value of 10 corresponds
to roughly 3 seconds), and then start playback. The QuickTime Player responds
so fast that you can back up further by repeatedly hitting the function
key. In use, I found that I often wanted to back up more or less
than three seconds, so I duplicated the script twice, set rewindtime to 2 in
one copy and 60 in another, and then assigned the new scripts to adjacent keys.
I assigned another script to the F11 key to stop playback.
As I was transcribing the interviews, it struck me that readers would enjoy hearing what everyone sounded like. (The book comes with a DVD.) The interviewees included producers Don Was (Rolling Stones), Bob Ezrin (Pink Floyd), and Nile Rodgers (Chic, Madonna); artists including the Crystal Method, BT, and Todd Rundgren; and big thinkers including Ray Kurzweil. So when I came across a potential sound bite, I used an AppleScript to grab the current playback position of the QuickTime Player and copy it to the Mac clipboard so I could paste it into the transcript. (See Figure 3.)
You’d think that telephone interviews would be rather impersonal, but in some ways, they’re more intimate than live ones. After all, your lips are essentially right against each other’s ears. When recording the interviews for my book, I often found myself working the telephone receiver the way a singer might work a microphone, moving it closer and farther to adjust the balance between the voices.
I initially recorded the interviews with a $15 Radio Shack telephone tap. (See Figure A.) This little plastic gadget connects between a telephone’s handset and base, feeding both sides of the conversation into a one-eighth-inch plug that you then connect to your recorder’s mic input. I soon replaced the Radio Shack tap with the better-sounding and far sturdier JK Audio QuickTap (Figure B). Ideally, I would have used a hybrid tap like JK Audio’s Inline Patch that outputs both sides of the conversation on separate connectors. When both parties spoke simultaneously, it was sometimes tough to make out what each was saying, and the overlap made preparing sound bites for the book’s bundled DVD more difficult. I didn’t think listeners would want to hear my enthusiastic grunts and snorts, so I worked hard to edit them out.
Note that recording telephone calls without the other party’s consent is illegal in some states.
Figure A. The inexpensive Radio Shack phone tap also lets you inject external audio into a phone line.
Figure B. The JK Audio QuickTap is the black metal box to the left of the phone. Its output is feeding my Korg PXR-4 digital recorder at the right.
One of the things that makes the QuickTime Player so efficient for this application is that it doesn’t display waveforms. I could load a 75-minute audio file in seconds, whereas a standard audio editing program might take several minutes to calculate the graphics. However, subsequently zeroing in on the sound bites with QuickTime Player’s coarse playback-position slider was awkward. So I wrote a complement to the previous script that let me enter the playback position in minutes and seconds, click a button, and jump right to it. That’s shown in Figure 4.
Automating my transcription project with AppleScript saved me staggering amounts of time and arm strain. In the end, I still had to farm out about 25 interviews to professional transcribers, but I used my scripts to check every one when they came back. One transcriber (the lone Mac user) even used my scripts to do her batch. That was especially heartening, because she’s uncomfortable with computers. Although I’ve taken you through the bowels of the system here, actually using this technique is as simple as pushing buttons: F9 rewinds, F10 rewinds more, F11 stops, and F8 grabs the playback position. Use the download link at the top of this article to grab these scripts and try them out yourself. And please let me know if you think of other interesting ways to use these techniques. I feel like I’ve only scratched the surface of what ApplsScript can do.