Download example files

So there I was, with 56 hour-long interviews recorded on my hard drive, urgently needing conversion to text so I could finish writing my book. My first thought was to try speech-recognition software, but one of the leading companies shot down that idea right away. It’s hard enough to train a computer to recognize your own voice, they explained, let alone decipher another person’s—particularly when it’s been savaged by the U.S. telephone system. Some writers get around that challenge by repeating the interviewee’s answers into the computer, but I figured that by the time I’d done that, I could have typed them anyway.

That’s when I hit on the idea of converting my Mac into a virtual Dictaphone, the kind of remote-controlled tape deck that professional transcribers use. Typically, these systems (which cost hundreds of dollars) play analog cassette tapes under foot pedal control. When you tap the left pedal, the tape will rewind slightly and then resume playing. That’s much more convenient than taking your hands off the keyboard and hitting Stop, Rewind, and Play; and the rewind length is consistent. Holding the pedal will usually rewind the tape further, and the right-hand pedal will stop playback, fast-forward the tape, or perform other functions. Newer Dictaphone-style machines use digital signal processing to maintain the speaker’s pitch while playing back faster or slower.

Sony Cassette Transcriber Traditional transcribers like this Sony come with a remote-control foot pedal. I emulated its functions by AppleScripting the QuickTime Player.

Of course, I’m not the first person to think of transforming a computer into a Dictaphone; there are countless transcription programs out there. NCH Express Scribe (free for Mac and Windows) even comes with instructions for building your own foot pedals. But by designing my own system, I was able to optimize it for the way I wanted to work. Furthermore, adding new features was easy; the technique works on movie files as well as audio, and I learned some cool tricks that I subsequently used in other projects. Here’s how I did it.

QuickTime, NagTime

If you haven’t bought QuickTime Pro ($29.95), you’ll be hounded by a window that urges you to upgrade. Often, this "nag screen" opens behind other windows, preventing the QuickTime Player from starting until you dredge up the sales pitch and dismiss it with a click. The nag screen seems to come up the first time you open the QuickTime Player in each 24-hour period.

Some people say that substituting the launchcommand for the activatecommand when opening QuickTime Player from an AppleScript will prevent the nag screen from opening. That method hasn’t worked for me, but here’s one that has: After starting your Mac, temporarily set the system clock to 20 years or so in the future, and then launch the QuickTime Player. When the nag screen appears, click "Later." Now set your clock back to the correct date. The QuickTime salesman should then leave you alone for two decades. I eventually upgraded anyway, because QuickTime Pro provides scads of useful editing and exporting features.

Transcription Toolkit

The system I devised has three components: the player, the control commands, and the triggers. To play back the audio files, I used Apple’s QuickTime Player, which is included with every Mac. I controlled it with a collection of AppleScripts, which are like miniature programs you can write in a simple language. To trigger the AppleScripts, I used the function keys in Tex-Edit Plus, a brilliant shareware word processor. In fact, I wrote most of my book in that $15 program, using Microsoft Word only to add headers and footers to the final text I sent to the publisher. Tex-Edit is a streamlined program that’s almost infinitely extensible through AppleScript. One enthusiast site offers hundreds of free scripts that do everything from colorize HTML tags to convert the selected text to pig Latin.

If you’d rather control your virtual Dictaphone from another application, try an app launcher such as One Key, iKey, Keyboard Maestro, or DragThing. (Some launchers may require you to save your AppleScripts as stand-alone applications.)

Figure 1 shows how easy it is to script the QuickTime Player. I opened Apple’s free Script Editor program, typed the text in the bottom pane, and clicked the Check Syntax button. Script Editor then added formatting to show the roles of the various words: language keywords in bold red, application keywords in blue, comments in gray italics (preceding text with two hyphens makes it a comment), and values in plain text. Pressing the Run button causes the front-most QuickTime movie or audio file to play back at 1.4 times its normal speed, producing a Munchkin effect, because the pitch is shifted up as well. What if there’s no QuickTime file open? The try block that brackets the set command prevents the Mac from throwing an error message. After I got more experienced with AppleScript, I added code that prompts the user to open a file if one is not yet open.

Figure 1: Munchkins Figure 1. This simple script increases the QuickTime Player’s playback rate. For a Darth Vader effect, substitute a value lower than 1. I found that 0.85 worked well for figuring out difficult passages of audio.

Note that the file name in Figure 1 ends with two underscore characters and "F10." When you store the AppleScript in Tex-Edit’s Scripts folder, that syntax assigns the script to a function key. A single hyphen maps a script to the F10 key, two hyphens map it to Command-F10, and three hyphens map it to Command-Shift-F10.

RetroPlay—the Main Event

To emulate the backspace pedal on a Dictaphone, I sent the QuickTime Player a step backward movie command, as shown in Figure 2. Pressing the associated function key will rewind the file by the amount set in the rewindtime variable (a value of 10 corresponds to roughly 3 seconds), and then start playback. The QuickTime Player responds so fast that you can back up further by repeatedly hitting the function key. In use, I found that I often wanted to back up more or less than three seconds, so I duplicated the script twice, set rewindtime to 2 in one copy and 60 in another, and then assigned the new scripts to adjacent keys. I assigned another script to the F11 key to stop playback.

Figure 2. RetroPlay Figure 2. The heart of the virtual Dictaphone is the "step backward movie" command. Click here to see the entire script, including the pickfile() subroutine that fires if no file is open.

Do You Have the Time?

As I was transcribing the interviews, it struck me that readers would enjoy hearing what everyone sounded like. (The book comes with a DVD.) The interviewees included producers Don Was (Rolling Stones), Bob Ezrin (Pink Floyd), and Nile Rodgers (Chic, Madonna); artists including the Crystal Method, BT, and Todd Rundgren; and big thinkers including Ray Kurzweil. So when I came across a potential sound bite, I used an AppleScript to grab the current playback position of the QuickTime Player and copy it to the Mac clipboard so I could paste it into the transcript. (See Figure 3.)

Figure 3. Determining Playback Position Figure 3. This script displays the current playback position of the QuickTime file and gives you the option to copy it to the Mac clipboard. Click here to see the full Monty.

Telephone Tapping

You’d think that telephone interviews would be rather impersonal, but in some ways, they’re more intimate than live ones. After all, your lips are essentially right against each other’s ears. When recording the interviews for my book, I often found myself working the telephone receiver the way a singer might work a microphone, moving it closer and farther to adjust the balance between the voices.

I initially recorded the interviews with a $15 Radio Shack telephone tap. (See Figure A.) This little plastic gadget connects between a telephone’s handset and base, feeding both sides of the conversation into a one-eighth-inch plug that you then connect to your recorder’s mic input. I soon replaced the Radio Shack tap with the better-sounding and far sturdier JK Audio QuickTap (Figure B). Ideally, I would have used a hybrid tap like JK Audio’s Inline Patch that outputs both sides of the conversation on separate connectors. When both parties spoke simultaneously, it was sometimes tough to make out what each was saying, and the overlap made preparing sound bites for the book’s bundled DVD more difficult. I didn’t think listeners would want to hear my enthusiastic grunts and snorts, so I worked hard to edit them out.

Note that recording telephone calls without the other party’s consent is illegal in some states.

Radio Shack Tap
Figure A. The inexpensive Radio Shack phone tap also lets you inject external audio into a phone line.

QuickTap and PXR4
Figure B. The JK Audio QuickTap is the black metal box to the left of the phone. Its output is feeding my Korg PXR-4 digital recorder at the right.

Jump to It

One of the things that makes the QuickTime Player so efficient for this application is that it doesn’t display waveforms. I could load a 75-minute audio file in seconds, whereas a standard audio editing program might take several minutes to calculate the graphics. However, subsequently zeroing in on the sound bites with QuickTime Player’s coarse playback-position slider was awkward. So I wrote a complement to the previous script that let me enter the playback position in minutes and seconds, click a button, and jump right to it. That’s shown in Figure 4.

Figure 4. Jump to Location Figure 4. This script excerpt lets you jump to any point in the file with single-second precision. (Click here to see the full script.)

Transcribe Your Heart Out

Automating my transcription project with AppleScript saved me staggering amounts of time and arm strain. In the end, I still had to farm out about 25 interviews to professional transcribers, but I used my scripts to check every one when they came back. One transcriber (the lone Mac user) even used my scripts to do her batch. That was especially heartening, because she’s uncomfortable with computers. Although I’ve taken you through the bowels of the system here, actually using this technique is as simple as pushing buttons: F9 rewinds, F10 rewinds more, F11 stops, and F8 grabs the playback position. Use the download link at the top of this article to grab these scripts and try them out yourself. And please let me know if you think of other interesting ways to use these techniques. I feel like I’ve only scratched the surface of what ApplsScript can do.