The vocoder has been a novelty staple (if that isn't an oxymoron) of pop music for more than 30 years. Even if you've never heard the word vocoder, you've heard the smoothly sexy yet oddly alien sounds that emanate from this device. In the '70s and '80s, vocoder was used on songs by Kraftwerk, Stevie Wonder, Bon Jovi, Donna Summer, Phil Collins, Laurie Anderson, Pink Floyd, Styx, and many other artists. (See the "Famous Vocoder Songs" sidebar.) In more recent years, Foo Fighters, Metallica, Beck, Britney Spears, Madonna, Marilyn Manson, and other pop icons have dialed up the sound of the vocoder.
A vocoder (short for "voice encoder") is a device that makes ordinary sounds, such as chords played on a synthesizer, sing or speak recognizable words. In this tutorial we'll look at how vocoders perform this magic, explain how to set up your software vocoder, and suggest some unexpected musical uses for vocoding.
The best way to grasp the creative power of vocoding is to try it yourself, so I've prepared some downloadable tutorial files (2.2MB ZIP file) for the advanced vocoder in Propellerhead Reason. If you don't have Reason, you can listen to the audio clips or download the 30-day demo version of the software to experiment with vocoding for yourself.
The Korg MS2000B analog modeling synth comes with a microphone to feed its onboard vocoder. Here's an excerpt of a song I recorded with the MS2000B's predecessor, the MS2000 (same synth, different color). Listen for the vocoding at the end.
A vocoder works by imprinting the constantly changing frequency spectrum of one signal onto the sound energy in another signal. Thus a vocoder always has two separate inputs—the speech and carrier inputs. The speech (also called modulator) input receives, not surprisingly, a signal containing spoken words and phrases. The carrier input receives the signal that will be "vocoded" by having the frequency characteristics of the speech signal imprinted on it.
Other sound sources can be used instead for the speech input, but spoken words are the most common sources. Note also that it's not necessary to sing into a vocoder; the pitch information in the output comes from the carrier signal, so speaking in a normal tone works fine.
The carrier signal will be processed and then routed to the vocoder's output. The speech signal will do its job and then be discarded.
To understand how a vocoder works, you need to know that almost all sounds contain energy at a number of different frequencies. If I bow a note on my cello, for instance, the note itself may have a frequency of 100Hz (100 cycles per second). But the tone will also contain vibrations at 200Hz, 300Hz, 400Hz, and so on. These higher-frequency vibrations are called overtones or partials.
What we perceive as the particular tone color of a real-world sound is the combination of these partials—their relative loudness, their precise frequencies, and the way they change over time. If a trumpet, a violin, and a soprano play or sing exactly the same note, normal listeners have no trouble distinguishing one from another. We can do this because each sound source produces different partials. Our ears can very rapidly decode the mix of partials in each tone, usually without the slightest conscious effort.
A basic vocoder contains several elements, as shown in Figure 1. It has two banks of bandpass filters, a bank of envelope followers, a bank of amplifiers, and a mixer. Let's look at each component in turn, and then see how they work together.
Fig. 1. This basic block diagram of a vocoder shows just six frequency bands; some software vocoders have hundreds. Most vocoders also have additional processing to enhance the sound, as shown in this more elaborate design.
A bandpass filter is a device that, when fed a signal, allows only the frequencies within a narrow band to pass. The incoming signal may have partials at both higher and lower frequencies, but those partials are filtered out. For instance, if the center frequency of the filter is 1,000Hz and its bandwidth is 200Hz, it would allow partials between 900Hz and 1,100Hz to pass, while filtering out partials below 900Hz and above 1,100Hz.
That is a slight oversimplification, because the upper and lower boundaries of the pass-band are not rigid (especially with analog filters). To continue the previous example, if the lower boundary of the pass-band is 900Hz, it's not the case that a partial at 901Hz will pass through while one at 899Hz will be stopped dead in its tracks. Rather, the amplitudes of partials that lie beyond the upper and lower boundaries will be reduced more depending on how far beyond the boundaries they lie, as shown in Figure 2. But this spillage is not especially significant to the design of a vocoder. For practical purposes, we can talk about the bandpass filter as if it were a frequency window with sharp edges, even though the edges are in fact fuzzy.
Fig. 2. This bandpass filter, constructed in Izotope Ozone, is centered at 2kHz. Because it's in "analog" mode, the sides of the curve drop off gradually.
Fig. 3. A simple drum loop before and after being processed with the bandpass filter in Fig. 2. Click to hear the sound (100KB MOV).
The bandpass filters in a vocoder split the incoming speech signal into a number of separate signals, each of which contains only the sound energy within the narrow pass-band of that particular filter. For instance, each pass-band might be an octave wide. In this case, the filters would have something like the following lower and upper boundaries:
As you can see, that's a 10-band design. If the vocoder has 16 bands, each will be correspondingly narrower. Typical vocoder designs have 8, 16, 24, or 32 bands. With fewer than 8 bands, the speech input won't be detected accurately enough for us to understand the output. Conversely, using too many bands can reduce the personality of a vocoder by glossing over its characteristic distortion.
An envelope follower is a device that senses the amplitude of an audio signal and outputs a control signal (also known as an envelope) whose level corresponds to the input's amplitude. In other words, if the incoming signal is loud, the control signal output has a high level. If the incoming audio is soft, the control signal has a low level. When there's no incoming audio, the control signal drops to zero.
Envelope followers are used for various effects. A compressor uses a type of envelope follower, as does an auto-wah effect. Most envelope followers have attack and release controls. The attack parameter governs how quickly the envelope rises when the incoming audio increases in amplitude. The release parameter (sometimes called decay) governs how quickly the envelope falls back toward zero when the amplitude of the incoming audio drops.
In a compressor, a long release time may give better-sounding results. In a vocoder, however, both the attack and the release are normally kept fairly short. That ensures that the vocoder will be able to track the incoming speech signal accurately. In the Reason examples, you can try experimenting with the Vocoder module's Attack and Decay knobs. Increasing the Decay causes the sound to smear, while setting it too short makes the sound rather grainy.
In a vocoder (again, refer to Figure 1), each bandpass filter in the speech path feeds through a dedicated envelope follower. The result is this: when the speech signal has partials within a given frequency band, the output of that particular envelope follower rises. When there are weak partials or none at all within that band, the output of that particular envelope generator falls.
If we were to look at the outputs of the vocoder's envelope followers over time, perhaps displaying them as squiggles on the computer screen, we'd have a fairly accurate picture of the frequency content of the speech signal over the course of time. (Indeed, vocoders were initially developed to compress speech for telephony.)
The vocoder in Propellerhead Reason offers up to 512 bands, but for many applications, using fewer bands sounds better.
Meanwhile, an identical set of bandpass filters has been splitting the carrier signal into separate frequency bands. In this part of the vocoder, each filter's output passes through an amplifier. The gain of each amplifier is controlled by the signal coming from an envelope follower. So if the speech signal has significant energy in the 1.6–3.2kHz band, for instance, the envelope follower for that band will output a high control signal, which in turn will boost the gain on the amplifier that's receiving the portion of the carrier signal in the 1.6–3.2kHz band.
That's where the vocoding happens. The combination of bandpass filters, envelope follower, and amplifier causes the partials in the carrier's 1.6–3.2kHz band to take on the same amplitude contour as the partials in the speech signal's 1.6–3.2kHz band. Note, however, that if the carrier has no sound energy in a particular band, the amplifier can't create it. It can only amplify or attenuate whatever is happening in the carrier signal within that band.
All that remains is to mix the bands of the carrier signal back into a single output. That's handled by the vocoder's internal mixer. (Although our example has a mono output, some vocoder designs let you pan the outputs of the individual bands anywhere in the stereo field for a more spacious sound.)
One of the characteristics of human speech is that it contains extremely high-frequency sounds caused by sibilants (consonants such as "S") and fricatives (consonants such as "F"). The vocoder design described above is not very good at distinguishing among these sounds. In addition, many of the signals you might want to use as carriers don't have enough sound energy in the extreme high-frequency band to produce good sibilants and fricatives.
To make vocoded speech more understandable, some vocoders have a pass-through circuit that sends the portion of the speech signal above 8kHz or so directly to the output, mixing it with the carrier signal rather than attempting to vocode it. The level of the pass-through is usually adjustable.
The carrier should be a signal that has significant acoustic energy throughout as much of the sound spectrum as possible. A synthesizer sawtooth wave works well, and for this application the synth's own filter should be wide open. Noisy signals, such as white noise and sampled wind sounds, are also good choices for the carrier. A waveform such as a triangle wave, which has weak overtones to begin with, or a synth waveform that has already been filtered by the synth's own lowpass filter, tends not to produce good results when vocoded.
Fig. 4. If you load tutorial file 1 and press the Tab key to spin the rack around, you can see how the sound generators and processors connect. (Click to enlarge.)
Because a vocoder requires two different audio inputs, it can't be set up as an ordinary insert effect in a software-based multitrack. In some digital audio workstations, you place the vocoder plug-in on a stereo aux-send bus; the left side of the stereo signal becomes the speech input, while the right side becomes the carrier. The sends from two tracks can then be panned hard left and hard right to feed the vocoder. In other DAWs, the vocoder plug-in may have a special input bus selector for the speech signal.
I've had good luck using a drum loop as the speech input for a vocoder. Here's an example, a recent composition of mine created in Reason, "Peace in Palestine." For more background on the piece (and to hear a higher-resolution version), visit my site.
The first demo file that accompanies this article, OreillyVocoderDemo1.rns, also has an example of this technique; to replace the vocal speech input with the drum loop "speech," mute and unmute the appropriate tracks in the sequencer. (See Figure 4.) Here's how it sounds:
A drum loop with some rhythmic activity in various frequency regions works best. If there's not much going on in the mids, for instance, you might want to mix a bongo or conga percussion loop with the beat before sending it to the vocoder. Likewise, if you want to hear the rhythm of the kick in the output, your carrier signal needs to have some low notes. The PAiA vocoder has a distortion circuit on the carrier input to increase the level of overtones.
When using ordinary speech for the speech input, you may find it useful to compress and normalize the sample so as to bring its peaks up to a uniform level. Similarly, inserting a hardware compressor between the microphone and the input on a hardware vocoder can give smoother results.
The next example uses the same Reason patch, but with vocals instead of drums as the speech input. Here's the raw vocal, a quote from Chapter 3 of Nicholas Nickleby, by Charles Dickens:
In the tutorial file OreillyVocoderDemo1.rns, I've loaded the original WAV file containing the speech into a Reason NN-XT sampler. The carrier signal comes from a Reason SubTractor synth playing a basic sawtooth patch. The Reason Vocoder is set to 32-band mode. (Turn the Band Count knob to compare the other settings.) Here's the result:
Fig. 5. In tutorial file 2, the yellow cable carries the output of vocoder band 10 to the pan-control input on the mixer. As the output level changes, the sound moves left and right in the stereo field. (Click to enlarge.)
Some vocoders let you shift the frequencies of either the speech or carrier bandpass filters up or down. In some cases that will make the output more intelligible. It can also shift the vocal formants (the characteristic resonant frequencies of the human mouth and nasal passages) up or down. The result is perhaps less distinctive than formant-based pitch-shifting, but the technique can be useful. In the next example (which I created in the song file OreillyVocoderDemo2.rns), I played the speech sample down an octave, thus stretching it to twice its original length. (See Figure 5.) I then used the Reason vocoder's Shift knob to move the formants back up into the expected range. If you listen closely, you may also hear that one of the envelope-follower outputs is modulating the stereo pan position of the audio:
Some vocoders give you the option of repatching the envelope followers to arbitrary carrier bands. This will render the words in the speech input unintelligible, but the output can still have an expressive character. This technique is illustrated (perhaps badly) in OreillyVocoderDemo3.rns (See Figure 6). The vocal sample is the same as in the first two examples, though only the beginning of it is played, in a stuttering manner:
Fig. 6. In tutorial file 3, I cross-patched the envelope follower outputs to create a stuttering sound.
Human listeners relate strongly to the sound of the voice—not surprising, really. The vocoder is only one of the tools with which we can create high-tech hybrids of human and machine sounds. (Granular synthesis, in which a spoken phrase is broken up into tiny "sound grains," is another.) Filtered, ring-modulated, and pitch-shifted voices offer further opportunities for creating effects that are emotionally affecting, yet exotic. If you're looking for a way to give your music a new dimension, vocal processing would be a great place to start.
The Roland VP-550 packs vocoding, harmonizing, and other handy vocal effects into a simple interface. (Click to enlarge.)
Pop artists have been coaxing evocative sounds out of vocoders for decades. (Check out this 1947 recording called "Sparky's Magic Piano.") Until recently, Wikipedia had an extensive list of vocoder songs, but then some contributors complained that lists don't belong in an encyclopedia and deleted it. Fortunately, you can read Google's cached version here.
Following are links to audio clips of more interesting vocoder songs. All go to the iTunes Music Store unless noted.
Some of the classic sounds people think are vocoded, such as Peter Frampton's "Do You Feel Like We Do," were actually made with a talkbox, a small speaker connected to a hose that the performer sticks in his mouth. Here are a few other talkbox hits:
Of course, powerful tools can be abused as well. Neil Young's Trans album was so swamped in vocoded vocals that his record company sued him. Here's an even more abrasive example:
Return to the digitalmedia.oreilly.com