Speech Recognition and Speakable Commands

Mac OS X’s built-in speech recognition software lets you execute various system and application commands by speaking them (assuming you have a microphone attached to or built into your Macintosh). The system includes many commands. Application developers can also define spoken commands that work within their own programs, and users can expand the machine’s speakable repertoire by writing and installing scripts.

The speech recognition system is primarily useful for defining voice-activated macros and shortcuts. It doesn’t let you use your microphone as a complete alternative to the keyboard and mouse. The system can’t, for example, take dictation into a word processor. For that level of functionality, you need a third-party application, such as IBM’s ViaVoice (http://www.ibm.com/software/speech/mac/osx/).

Activating Speech Recognition

You can configure and activate the speech recognition through the three tabbed panes of the Speech preferences panel, shown in Figure 4-60.

The Speech preference panel, showing the Speech Recognition pane

Figure 4-60. The Speech preference panel, showing the Speech Recognition pane

To turn speech recognition on, set the “Apple Speakable Items is” radio button to On. This causes the speech systems’ round “listener” window (Figure 4-61) to appear. Note that the listener floats over all your active windows, but you can drag it anywhere you like. It remains visible until you switch the Apple Speakable Items button back to Off.

Speech’s microphone

Figure 4-61. Speech’s microphone

The system doesn’t try listening for commands until you put it into listening mode via the Speech panels’ Speech RecognitionListening tab. By default, the listening key is set to Esc (the Escape key). Until you press the Esc key, the microphone is grayed out; however, when the Esc key is pressed, the listener looks similar to Figure 4-61, complete with a sound input indicator (the blue-, green-, and red-colored bars) and indicator arrows showing that sound is being received by the microphone. If you don’t want to have the Esc key as the default key for listening, click on the Change Key button and enter a new key or key combination (e.g., Speech’s microphone-Esc).

Warning

If you use the Terminal and Speech Recognition together, you should change your listening key to something other than Esc because that key has a special meaning and functionality when running programs from the command line.

There are two Listening Methods:

Listen only while key is pressed

This setting is an on-demand mode, which listens only when the Esc key is pressed.

Key toggles listening on and off

Under this setting, pressing and releasing the listening key (Esc) toggles listening mode on or off. Because this means the computer’s microphone will actively receive and analyze sounds over longer stretches of time, you must set two more controls to help it discriminate spoken commands from background noise (or other things you might say while sitting at your computer).

If you have selected the “Key toggles listening on and off” radio button, there are some additional settings you should look at, including:

Name

By default, this is set to the Star Trek-esque Computer, which means you must first say “Computer, ...” before issuing a spoken command. You can change the Name; however, you should choose a name that’s easy to say and unlikely to appear in any conversation your Mac might overhear.

Name is

This pop-up menu defines how the computer differentiates spoken commands from other sounds. Here are the options in this menu:

Optional before commands

When this setting is selected, the machine doesn’t listen for its Name and tries to interpret everything it hears as a potential command. For example, in order to check your mail and then switch to iTunes, you only need to say, “Get my mail. Switch to iTunes.”

Warning

This setting is actually rather dangerous because if you are in listening mode, anything you say that the computer hears can be interpreted as a command.

Required before each command (the default)

This setting has the speech system listen for the name defined in the Name field before interpreting every command. For example, to hear a knock-knock joke, you could say “Computer, tell me a joke.” (If you try this, remember that you need to say “Computer, ...” before each step of the joke. For example, “Computer, who’s there?” and “Computer, Thea who?”)

Required 15 seconds after last command, Required 30 seconds after last command

These settings also require that you speak the Name that you chose in the Name text field. Once you have said it, however, the machine will continue to interpret sounds as possible commands until either 15 or 30 seconds have elapsed without recognizing a command. Thus, you can launch multiple commands like this: “Computer, get my mail. Switch to iTunes.”

The last two items at the bottom of the Listening pane allow you to specify which microphone to use (Line In or Internal microphone) and to set the volume for the microphone’s input. Pressing the Volume button pops open a window that lets you test and adjust the volume level by having you say some sample commands.

Speakable Items

To see which commands are available to you at any time, click on the triangle at the bottom of the listener window, as shown in Figure 4-62.

Clicking on the listener’s triangle opens its context menu

Figure 4-62. Clicking on the listener’s triangle opens its context menu

The Speech Commands window, shown in Figure 4-63, has two parts:

  • The top section shows a log of the speech commands issued.

  • A Commands section (at the bottom of the window), which shows a list of available commands, collapsed into categories with disclosure triangles to reveal the speakable commands.

The Speech Commands window

Figure 4-63. The Speech Commands window

By default, there are two categories in the Commands section:

Speakable Items

This is a catch-all category for commands you can invoke throughout the system.

Application Switching

This lists the special commands for switching (and launching) between applications, based on the application icons in the Dock.

Applications such as the Finder, Mail, and Safari define their own speakable items when they are the active (i.e., front-most) application. When that application is active, its list of speakable items shows up in the Commands section.

Customizing Speakable Items

If you click on the Open Speakable Items Folder button located on the SpeechSpeech RecognitionOn/Off pane, a Finder window pops open, listing the speakable items on your system. The speakable items available exist as files in ~/Library/Speech/Speakable Items. Files residing within that folder directly represent systemwide items. Those inside the Application Speakable Items folder are specific to various applications on your system.

Each speakable item can either be a property list (.plist) file, an AppleScript, or some other type of Finder object. Property lists simply execute the commands predefined by the system or speech-friendly application. AppleScripts are executed by the system.

In other words, users can make their own voice-activated commands by writing (or finding on the Internet) an AppleScript that performs a particular task or series of tasks, giving that script a pronounceable name. The AppleScript can then be placed into the ~/Library/Speech/Speakable Items directory.

Finally, any other Finder object—arbitrary files, folders, disks, or aliases to one of these—acts as if it was double-clicked in the Finder when its name is spoken.

Get Mac OS X Panther in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.