Chapter 4. Audio and Hearables

Sound enables us to be present in our environment, or it can transport us somewhere else. By listening, we can generate “a theatre of the mind,” where we use our imagination to construct visual elements, and even time travel. In many ways, we are already augmenting the sounds of our environment: the man on the busy train who hears nothing but the hum of his noise-cancelling headphones, or the woman who listens to her favorite rock music during take-off and landing. The future of augmented audio will take us beyond blocking out life’s “noise” and into more purposeful and impactful experiences with sound.

“When people talk about Augmented Reality (AR), they usually think it’s about visuals that are put on top of the camera image. It’s always a visual thing. But not many people think that the same can be done with sound,” said1 Michael Breidenbruecker, cofounder of and founder of RjDj (Reality Jockey Ltd.). The misconception that AR is just visual is still true today, but audio can create an experience that is highly transportive and transformative, without the need for augmented visuals.

Sound can be integrated with other senses in AR, or used on its own. It can help you to navigate and receive information, immerse you in an experience, engage your imagination through new modes of play, and customize your environment. This chapter explores each of these areas as well as “hearable” consumer devices (wireless technology worn on the ear), a new category gaining popularity in the rising wearable technology market. In addition to gathering physiological data from the ear for health monitoring and fitness activity, hearables make possible new forms of interaction and communication including intelligent digital assistants that listen and respond to voice commands.

Location-Aware Augmented Audio Walks

Navigation to a location, an informative museum tour, or a guided meditation are all examples of being led by a voice, recorded or live, to shepherd you on a journey. Audio can help to illuminate your surroundings, accompanied by sound effects or music, by guiding your focus and concentration along a path, even pointing out things that otherwise might not have been apparent. It can unlock a new awareness and appreciation of the world.

Canadian artist Janet Cardiff is internationally renowned for evocative audio walks, which she’s been creating since 1991. Cardiff discusses her work:2

The format of the audio walks is similar to that of an audioguide. You are given a CD player or iPod and told to stand or sit in a particular spot and press play. On the CD, you hear my voice giving directions, like “turn left here” or “go through this gateway,” layered on a background of sounds: the sound of my footsteps, traffic, birds, and miscellaneous sound effects that have been prerecorded on the same site as they are being heard. This is the important part of the recording. The virtual recorded soundscape has to mimic the real physical one in order to create a new world as a seamless combination of the two. My voice gives directions but also relates thoughts and narrative elements, which instills in the listener a desire to continue and finish the walk.

Cardiff creates a blended reality with audio as the driving element, superimposing an intimate narrative in a public space, to which you become privy. She even says3 you could describe her walks as a form of “time travel.” Cardiff’s work serves as an excellent precursor to the types of augmented audio experiences made possible today with advanced technologies.

Andrew Mason is the founder and CEO of Detour, a startup based in San Francisco that offers a collection of location-aware augmented audio walks called Detours. Mason acknowledges Cardiff among others as a huge inspiration. “The first thing I did when I started exploring the idea was to do a trip around the world to sample different takes on location-based audio experiences,” he explains. These included Cardiff’s walk through Central Park, the Hasidic Soundwalk in Brooklyn, and nonlinear walks in London produced by Fran Panetta. “And those experiences woke me up to the potential for location-based audio to create a cinematic experience, transport you into another world, and ultimately get closer than is possible through any other medium to the experience of walking in someone else’s shoes,” says Mason.

While wearing your personal headphones and using the Detour smartphone app, the voice of a local narrator automatically guides you as you walk. Mason distinguishes Detour from other audio tour apps that have you constantly fidgeting with your phone, or clicking a map with pins to play content. “We wanted something where people could just have an experience that feels like you’re there with a member of the community or whoever it is and the technology just melts out of the way,” says Mason.4 With audio as the interface, and the device tucked away in your pocket with no screen to look at, you’re free to focus on your physical surroundings and the adventure led by your personal guide.

What’s different about Detours from ordinary audio-guides is that the technology is always aware of your location throughout the experience, making for a dynamic tour that adapts the story to you and your pace, as well as to the time of day, and even the weather. There’s no need to press “play” when you arrive at each point of interest; the tour is responsive to your movements and already knows you’re there. To do this, Detour uses GPS, iBeacons, and other sensors in your smartphone for location precision. iBeacon is Apple’s implementation of Bluetooth Low Energy (BLE) wireless technology to create a way of providing location-based information and services to iPhones and other iOS devices. It can be used as a way to compensate for potentially unreliable GPS signals. The beacons themselves are small cheap Bluetooth transmitters, which serve as proximity sensors. iOS apps listen for the signal transmitted by these beacons and respond accordingly when your phone or tablet comes into range. iBeacons are used with the Detour app to automatically trigger a specific narration when you’ve reached a point of interest. In addition to iBeacons, Detour uses your phone’s accelerometer to detect movements and steps, and the magnetometer to know which direction you’re facing.

Augmented Audio to Build Empathy and Understanding

Detour’s goal is to “help people cut through the often-impenetrable veneer of a place and really feel what it’s all about,” says5 company spokesman Haris Butt. Detours take you on intimate walks, like through The Tenderloin, one of San Francisco’s most misunderstood and rapidly changing neighborhoods, to see a side of it that not many see, and most ignore all together. While on the audio walk, you’re not only listening to the stories of the people who live and work there, you’re walking into the church where they’ve slept, or the Single Room Occupancy (SRO) hotels they live in. “There’s something about going through the motions that others have gone through that, I think, can open people even more to understanding, or even feeling, what others feel,” says Marianne McCune, a producer at Detour.

McCune has worked as a reporter in radio for 15 years. She started a youth radio program at WNYC New York Public Radio called Radio Rookies in which teenagers growing up in difficult neighborhoods tell stories about themselves and their worlds. McCune describes part of the goal of the program as getting “them to tell stories about what they think is important to an audience that has little in common with them, so that the generally better-educated and better-off listeners of WNYC can understand where they’re coming from.” She references one of the teenage Radio Rookies, Shirley “Star” Diaz, addressing topics that generally shut people down, “But when you listen to her, you feel like you’re inside her life enough to understand her perspective,” says McCune. “She leads you through her world in a way that, I believe, allows people to see through her eyes a little bit. So, they’re not outside of her, as they would be if they were looking at her life, they’re seeing through her.”

“I think Detour has the potential to add one more level to seeing through other people’s eyes: it allows you to literally walk in their shoes,” says McCune. She points to a letter she received from a person who experienced The Tenderloin Detour. “A couple of my coworkers and I were intrigued because our offices are right next to The Tenderloin, and it is a neighborhood that we obviously hadn’t done much exploring in,” the letter read. “The most impactful moment came when we walked into St. Boniface church, and saw all of the homeless sleeping in the pews. We brought in some cough drops to donate and chatted with one of the volunteers, who was a recovering addict himself.” The audio walk sparked a conversation that might not have otherwise occurred, and created a transformative effect on the listeners. The letter ended, “It’s weird how something as simple as making a small donation and chatting with this man could completely change your perspective on an entire neighborhood.” McCune comments, “I think both the audio and the walking in other people’s shoes are pushing people past boundaries they don’t usually cross.”

Augmented audio has the power to permeate boundaries that can help foster empathy and understanding. It can encourage you to not only see the world through someone else’s perspective, but to experience it in situ (in the same physical place). It can inspire you to take action and effect change in the real world that we live in, even if it’s engaging in a conversation with someone to whom you normally wouldn’t.

Using technology to provide a gateway to the development of empathy is also being explored in Virtual Reality (VR). Clouds Over Sidra (2014) is a VR film in collaboration with the United Nations (UN) and film director Chris Milk that tells the story of a 12-year-old girl named Sidra who lives in a Syrian refugee camp in Jordan. The film was screened at the World Economic Forum in Davos in January 2015 to a group of leaders whose decisions affect the lives of millions. As Milk notes, these are people who might not otherwise be sitting in a tent in a refugee camp in Jordan.

When you’re inside of the VR headset and watching Clouds Over Sidra, you’re looking around her world, seeing it in full 360 degrees, in all directions. You’re not watching her through a television screen; you’re sitting right there in her room with her listening to her voice, as if you were there. It becomes your world, too. When you look down, you see that you’re sitting on the same ground that she is. “Because of that,” Milk says,6 “you feel her humanity in a deeper way. You empathize with her in a deeper way.”

Although VR is especially suited to transporting you to a location or environment that might be difficult to access physically or otherwise, AR can take empathy a step further by, as demonstrated by Detour, encouraging you to interact with your actual surroundings in the real world when possible. One of Mason’s goals with Detour is to get people out and moving, exploring the physical world around them. “So many companies seem to have this shared endgame of getting you sitting on your couch in your living room, having your food delivered, having your laundry brought out and sent back, talking to your friends on your Oculus Rift,” Mason says.7 “Maybe I’m being a Luddite, but I kind of like life’s rough edges. I want Detour to be a company that helps take people out there and celebrate them.”

Most of the tours are journalistic in that they tell true stories about history, people, and neighborhoods, but Detour is also using its augmented audio walks to tell the story of an issue: San Francisco’s war on garbage. The city’s plan is called Zero Waste, with the goal of sending nothing to a landfill or incineration by the year 2020. It means reusing or recycling every single thing San Franciscans throw away. Rather than just showing you a landfill or a dump, the narration helps you think about how much trash we make every day by leading you through the nooks and crannies of daily life in San Francisco. Detour changes the way you see the world with audio, and hopefully inspires a transformation that continues long after the tour is complete.

Helping the Blind Navigate Urban Spaces

Augmented audio walks like Detour’s and Cardiff’s enable you to delve deeper into your physical surroundings, discovering another world that you might not have accessed otherwise. But what about people who might need to rely on such a technology to help them navigate daily life? Microsoft’s Cities Unlocked is a new sound technology developed for people with sight loss to help them navigate urban spaces.

“The project was inspired when my daughter was born,” says8 Microsoft’s Amos Miller, who is visually impaired. “I wanted to be able to take her out for a day, or just go to the cinema, and I thought, ‘how can I make that something that I would not hesitate to do?’” He describes how the navigation system is like “painting a picture of the world through sound, similarly to how a lighthouse guides with light” and how it can remove the fear of new journeys.

Wearing a bone-conducting headset that is connected to your smartphone, you hear a voice that guides you and describes your surroundings. The bone-conducting headset sits above your jawbone, and transmits audio through your jawbone to your inner ear using vibrations. This allows you to still hear noises around you as you normally would, rather than blocking your ears with headphones. A small box located on the back of the headset contains an accelerometer, a gyroscope, a compass, and a GPS chip to track your position. The system is connected to your smartphone, with location and navigation data from GPS and Microsoft Bing Maps to help guide you, as well as a network of Bluetooth-enabled beacons (similar to Detour) placed in urban locations.

Directional audio technology is used to create a 3-D soundscape, making navigation directions and the description of landmarks sound like they’re coming from where they are actually located. If a point of interest is 10 meters ahead and on the right, that’s where the voice will sound like it’s speaking from. In addition to turn-by-turn voice directions, audio cues are integrated throughout the navigation, such as a galloping sound to indicate that you’re on the right way, or sonar pings to warn you if you’re nearing a curb. You can even ask the system for additional information on local landmarks, like opening hours, all drawn from the Bing database by using your voice or a physical remote. Microsoft has additionally developed an integrated application called “CityScribe” that makes it possible for people to tag obstacles in their city that most mapping services do not pick up, such as park benches, low jutting corners, bins, or street furniture.

Kate Riddle is one of the people putting Microsoft’s headset through its trials. Riddle is severely visually impaired and says the technology helps her go to new places and not feel anxious or out of control, whereas before she would follow the same routes to the same places out of habit because she memorized them. “It takes out so much of the stress of being somewhere new,” she says.9 “That is massively empowering, and it makes the journey pleasurable rather than a chore. Rather than going out because you have to, this is a ‘going out because I can’.” For people like Riddle, this technology can truly be life-changing.

Use cases for Cities Unlocked can also be extended to the general population. In a Microsoft video, the narrator points out how it’s not difficult to imagine everyone using this technology in the near future for all sorts of daily challenges, like trying to find the nearest bathroom in a large shopping mall, or exploring a new city where you don’t speak the language. The technology not only can help direct you to a place of interest, it can inspire you to explore with confidence by being better attuned to your surroundings with a newfound immersion in the physical world.

Designing Something Good for Everyone

Audio is one aspect of the overall design of Cities Unlocked. Human–Computer Interaction (HCI) pioneer and Microsoft Principal Researcher Bill Buxton says his favorite part of the project isn’t the technology, but that there will be a moment for wearers when the technology disappears altogether. “The best technology is invisible. It just lets me get on with my life,” says Buxton.10 “When an interaction with technology is as it should be, the user is not there as an equipment operator, but as a human being. She is not walking down the street operating technology, but with the intent to go to work, or get fresh air, or exercise.”

The key to great design for Buxton is simple: if you understand and design for the needs of highly specialized users, you’ll often end up making something good for everyone. He explains:11

For me as a designer of interaction whose focus is always about the quality of the human experience, I found out very early on that if you want to understand something, you go to the extreme cases and try to understand things at the edges. In nearly all cases, what you learn people need while you’re there will also apply to the general population.

I reached out to Buxton12 asking him to elaborate on the idea of designing for “extreme cases” to benefit everyone. He replied, “Cities Unlocked is an example of an edge case pushing the general case of acoustic AR forward.” Buxton pointed me to his “favorite reference,” an article from 1987, “Making Computers Accessible to Disabled People” by Frank Bowe.13

Bowe wrote, “If options for different users were incorporated into the design of all computers, the lives of millions of disabled individuals could be greatly enhanced.” He referenced the design of buildings as the most familiar concept of accessibility, citing accessible architecture in the form of automatic doors and entrances level with exterior landscaping. “These designs seem natural to us: they do not look as though they were created specifically for individuals who are handicapped,” noted Bowe. He used the example of a ramp, observing how for each wheelchair user, there are ten people without disabilities who take advantage of it: parents with baby carriages, bicycle riders, furniture movers, and pedestrians who simply find it easier to walk up a ramp than climb stairs.

Bowe pointed to examples in which the industry began to realize that technology developed to meet the special needs of disabled consumers was practical for everyone. He highlighted computers that were being developed to understand and recognize human speech (in 1987), benefiting people like executives who were reluctant to use keyboards, and workers who used their hands for other tasks, such as quality inspectors on factory assembly lines. I asked Buxton if he thought designing new technologies has changed since the article was written three decades ago. His response was, “With design, solutions to deal with complexity introduced to all by diverse new technologies can aid accessibility, and vice versa.” It’s an important point to consider as we design the future of AR technologies and experiences to be accessible and helpful to as many people as possible.

Surrounding You with Sound

Sound is being used in VR to heighten the believability of virtual places and make them feel real. We will see sound applied more in AR to increase immersion, ranging from sound effects to voice interactions. Cardiff, Detour, and Cities Unlocked each blend the reality of your physical world with a virtual one to guide you on a journey led by audio. In addition to navigation, sound will contribute to storytelling and entertainment in AR.

Joel Susal, Dolby’s director of VR and AR, says,14 “Audio in virtual reality is not a luxury, it is a necessity.” He points out how there is no screen that frames reality in real life, we experience an awareness of our surroundings in all directions. In VR, we need to be similarly stimulated. Unlike traditional films for which our focus can be drawn to certain points on the screen, 3-D environments require more than just visual stimuli. “Your ears move your head,” says Susal, which is why directional sound is so important. Sound adds to immersion in VR, allowing filmmakers to guide you through a story. In the way that Cities Unlocked used spatial audio cues to guide the wearer by simulating sound coming from a distinct location, this same effect can be applied to VR and AR to help steer the user’s attention in a story narrative or in a game.

While wearing AR eyewear like Microsoft’s HoloLens (equipped with two small speakers resting near your ears), when you turn your head and body away from objects making sounds, the sound moves accordingly so that it is now coming from behind you if you turn your back to the source of the sound. Similarly, when you move closer to a virtual object, the audio becomes louder. This helps to make virtual objects seem more real. For example, while playing an AR game, you’ll be able to hear a virtual dragon thumping its way toward you, and roaring in your left ear.

HyperSound’s audio technology provides another way to create immersive effects. Like a flashlight directs a ray of light, Hypersound directs sound in a narrow beam using ultrasound waves and confines it to a specific location to create a precise audio zone. Listeners outside of the audio area are not able to hear it, whereas for those inside the channel, the effect is similar to listening to audio with headphones. It is possible to create a private listening zone in a public place as well as project a sound beam to a targeted location. For example, McDonald’s is using HyperSound in a pilot program to direct television sound to specific tables in their restaurants, allowing diners to listen to the television without disrupting others. Uses range from retail displays, to museum installations, to gaming.

The BoomRoom15 is a prototype by Jörg Müller, an associate professor of computer science at Aarhus University, which presents a novel way to directly interact with virtual sound sources in midair. Müller and his team have created a small room (three meters in diameter) in which a ring of 56 loudspeakers is hidden behind curtains. Using computer vision and gesture tracking, sounds can be assigned stationary or mobile positions.

A spatial music mixing room was built as an application of the system. A music track can be assigned to an object in the room like a vase. To play the track, you pick up the vessel and “pour out” a track from the vase in midair. Gestures such as moving your hands apart or bringing them together can manipulate volume, treble, and bass.

Müller writes, “We believe that the ability to ‘touch’ sound sources in midair and to make objects ‘speak’ opens many new opportunities for human–computer interaction.” As an example, he describes a marble answering machine that consists of an ordinary bowl filled with marbles:

When a marble is taken out of the bowl, the marble itself could play the recorded message, while being carried through the room. If the user wants to delete the message, she could simply pull it out of the marble and drop it into the bin. She could even speak a reply into the marble that would be returned to the caller. If she wants to keep the message, she could simply drop the marble into another bowl.

Now, of course, the marble itself isn’t playing the sound, it is an illusion. The marble is an ordinary marble; the loudspeakers hidden in the walls play the sound so that it appears to be coming from the marble. Müller says, “The idea is that all objects themselves are completely normal and uninstrumented.”

In another example, Müller describes how unread emails could be symbolized by a flock of birds that sit or fly somewhere in the room, with new emails flying in and around the user, and urgent mails flying over the user, all depicted with directional audio. Different senders could be recognized by the chirp of the birds. If the user wants to read a message, she could walk over to the bird, “touch” it in mid-flight, and the message would be read aloud. She could reply to or forward emails by grabbing and manipulating the chirp.

Augmented audio can create new ways of interacting with our everyday world. It’s not as much about a marble answering machine, or a flock of birds delivering your emails, as it is about designing new interaction paradigms for this emerging medium that were not possible before. Such imaginative and artistic explorations of typically mundane activities inspire rethinking how we interact with information, and what new types of experiences we can invent that are unique to the medium.

For the BoomRoom to work in everyday life, the hope is that loudspeaker panels will be cheap enough to be integrated into walls in your home. For now, the prototype provides a fascinating way to think about how we can apply augmented sound to communicate and interact with our surroundings in physical ways using our body movements.

Imagination and Playing with Sound

Japanese wearable-technology startup Moff Band has designed a wearable device for children (available for purchase on Amazon) that uses gesture with sound to support imaginative play and storytelling. The slap-on bracelet connects to an app on your smartphone or tablet via Bluetooth and uses the band’s built-in accelerometer and gyro sensors to detect what movements the child is making. The selected sound effects are played back in real time to match the child’s movements, and include sounds like air drums and guitar, ninja swords, and sports. The Moff Band can detect two separate movements: moving your arm left or right, and up or down. You can use it up to 30 feet away from the device while still making sounds.

Two children (or adults) can wear Moff Bands to play together. For example, by pretending to play a game of tennis, you hear the sounds of a tennis ball bouncing back and forth as you swing an imaginary racket through the air. You also can hear the sound of a cheering virtual crowd. The Moff Band blends technology with physical activity; it makes kids jump up and down, and move around to trigger the sounds.

Moff Band has teamed up with PBS KIDS in the United States to launch PBS KIDS Party App. Activities (for 5–8 year olds) are designed to foster learning through imaginative play and movement while wearing the bracelet. The app includes a game of Freeze Dance, a Piñata Party game, a counting game, and the ability to record your own sounds.

Konstruct is an AR experience by James Alliban from 2011, powered by London company String’s technology, that allows you to build virtual sculptures with the sound of your voice and a smartphone. It’s a different way to think about augmented audio: rather than sound being used to support a visual, and make an experience more realistic, Konstruct generates abstract visuals with your voice. Like the Moff Band, it responds to you: in this case by speaking, whistling, or blowing into the device’s microphone. You can combine a variety of 3-D shapes, color palettes, and settings to build an endless collection of virtual structures. The volume of the noise also influences the size of the shapes.

Both Moff Band and Konstruct create playful scenarios to experiment with sound in imaginative ways, resulting in an experience that can be unique each time, configurable, and personalized. One of the reasons it is immersive is because the experience is directed and defined by the user.

Augmented Audio and Personalization

RjDj, the London-based startup founded in 2008 by cofounder Michael Breidenbruecker, developed a nonlinear form of music called reactive music, which reacted to the listener and his environment in real time using a smartphone app. While wearing headphones, the sounds of the listener’s physical surroundings were picked up by the built-in microphone on his smartphone and remixed in real time, creating a personalized audio track that was unique to him and his context. Although RjDj closed its website and removed its apps from circulation in 2013, its innovative work is still relevant to the development of augmented audio today.

We are all familiar with the idea of personalized play lists ( provided such a service), but RjDj goes beyond playlists to actually personalize music and the song itself so that it’s responsive to you and your surroundings. “It took audio technology almost 10 years after my initial idea to get into shape for RjDj,” said Breidenbruecker.16

RjDj’s apps explored the new interaction methods the iPhone introduced, taking advantage of the integrated components, sensors, and inherent portability of the device to create a musical experience not possible before. RjDj’s chief creative officer, Robert Thomas, explains the technology behind their apps:

RjDj’s apps were all based on the open source software Pure data. RjDj developed its own port for the iPhone. We had our own library of tools for use in making augmented sound experiences. From a sensor point of view, we used almost every sensor possible on the iPhone, including movement, time, weather, location, and obviously the microphone through which we analyzed loudness and audio frequencies.

Augmented audio apps today, like Detour, similarly take advantage of smartphone sensors to deliver a contextual experience, knowing how far along you are on the tour, and cueing content specific to your location. We will see sensors applied to other forms of wearable technology beyond smartphones, including AR glasses and across the human senses. Sensors will play an important part in the coming future of AR experiences to present personalized content, enabling you to be more in tune with your environment in new ways.

Here One, from wearable-technology company Doppler Labs, is a more recent product. Consisting of a pair of wireless earbuds that connect to a smartphone app via Bluetooth, Here One manipulates environmental sounds in real time to create a personalized audio experience.

In its first iteration, Here One is aimed at musicians and audiophiles. Using controls within the app for trebles, mids, lows, and effects like reverb, echo, and flange, you can tune the audio around you to your preferences and even remix live audio. “Here One does not stream or play recorded music,” Kraft explains.17 “Instead, the Digital Signal Processor inside Here One acts as a studio in your ears by providing you with a volume knob, equalizer, and effects to transform real world audio.”

Kraft envisions the next version of the Here One system providing the ability to single out certain frequencies and tones to tune out noises in the real world, such as a baby crying, or a screeching train. In this regard, we can think of Here One as a form of Mediated Reality (Chapter 2), “a self-created personal space,” as Steve Mann described it, yet the emphasis here is on audio as opposed to visual. Rather than Mediated Reality separating us from the world and one another, examples like Here One can be used to provide focus, tuning down distractions, and tuning up sounds we are interested in, such as the voice of the person you are dining with in a noisy restaurant.

For Kraft, the future of Here One is contextual; he explains how he envisions this working:18

We see this as being a quotidian device that you put in your ear and you leave in to optimize different environments. The machine algorithms we’re working on will actually be intuitive. So, imagine walking into a restaurant and through using geo-positioning, and heuristics, and learning you as an individual, we can say, hey Bill, we know first thing your noise preference. Most times you walk into a restaurant, you reduce it fifteen percent. But we also know this room and we know because you’re in the back-left corner, there’s going to be reverb off the back two walls. And since you’ve put it into conversation mode, we’re going to use our directional mics and we’re going to bring down the ambient and make it so that conversation in the back corner that you want to have intimately is really perfect and right in that environment.

Kraft references VR as being isolating and “taking you out of reality;” like Mason of Detour, he wants you to be more immersed in reality. Kraft believes in a future in which “you don’t have your face in a screen like we all do all the time right now,” and where we are using our innate senses in an enhanced way to their full potential, whether it’s removing noise, or tapping into a smart assistant with voice commands.

Hearables That Are Always Listening

Hearables, or ear-based computing devices like Here One, create a new way to hear and interact with your surroundings. As Kraft began to hint at, hearables also make it possible for your environment to listen to you with voice interactions. Voice is the most common way to communicate, and we are already familiar with and used to wearing earbuds and Bluetooth wireless headsets; this could all help hearable devices to enter the mainstream faster. Noel Lee, founder and CEO of headphone maker Monster, refers19 to headphones as “the first mass-accepted wearable.” A current trend in wearables is to hide the technology, with the ear providing a good place to do that.

The goal for Motorola’s wearable earbud, the Moto Hint, first introduced in 2014, is for it to be invisible for the wearer, omnipresent, and ready for voice input. The Hint is a single earbud that fits inside your ear and is compatible with any Bluetooth-enabled smartphone or tablet. It has a speaker, a touch-sensitive panel, dual noise-canceling microphones, a rechargeable battery, and an IR proximity detector, which allows the device to turn on automatically when you’ve inserted it into your ear.

You can use the Hint to make and answer phone calls, or listen to podcasts or music, within a 150-foot range. But the most powerful feature of the wireless earbud is that you can speak commands to interact with intelligent personal assistants like Moto Voice, Google Now, or Siri (the future of augmented personal assistants is discussed in Chapter 7). You can ask and get answers to questions like “do I need an umbrella today?” “what’s my next appointment?” or “how far am I from home?” without reaching for your phone. When Hint is paired with a Motorola smartphone such as the Moto X, it enters a mode in which it is always listening, and you can interact with the earpiece by saying the phone’s custom voice prompt (configured by you). However, when connected to an  iPhone or other Android device, you can’t just speak to it: you need to tap the Hint to activate Google Now or Siri each time. Although the Hint isn’t perfect, it does encourage a hands-free experience out in the world, rather than having your head down in a screen.

In 2016, Apple announced AirPods, a pair of wireless earbuds. Users can access Siri by double-tapping an AirPod without taking your iPhone out of your pocket, and the AirPods connect automatically with your Apple devices, such as your iPhone and Apple Watch, with the sound switching instantly between the devices. In the Apple AirPods film, Jony Ive, Apple’s chief design officer says, “We’re just at the beginning of a truly wireless future we’ve been working toward for many years where technology enables the seamless and automatic connection between you and your devices.”

Future hearables could have extended capabilities to create more personalized experiences that not only listen to your voice, but also listen to your body. Devices worn in the ear can be used to gather biometric information including blood pressure, heart rate, ECG, and core body temperature. Valencell, a company in the United States, is developing biometric sensor technology for wearable devices, including the PerformTek earbud sensor module to collect physiological data from the ear.

The technology uses photoplethysmography (PPG), a noninvasive optical technique to measure blood flow and activity. Using the PPG method, light is shined on the surface of the skin and an optical detector measures changes in scattered light from the skin and blood vessels (this is often done in hospitals with a device that fits over your fingertip).

Valencell licenses its PerformTek sensory technology to consumer electronics manufacturers, mobile device and accessory makers, sports and fitness brands, and gaming companies for integration into their products. Such ear-worn devices include LG’s Heart Rate Monitor Earphone and iRiver’s iRiverON Heart Rate Monitoring Bluetooth Headset. While listening to music, iRiverON is designed to help you “exercise smarter” by tracking your biometrics including heart rate, calories burned, and speed and distance travelled. For example, before going on a run, you place the device into your ears and connect it to your smartphone. During the run, the device captures your biometrics through the headphones. A voice-feedback system on the device speaks into your ears to notify you of your heart rate zone and whether calories goals have been reached. The data is sent to a smartphone app in real time for review later.

The opportunities for hearables are not limited to the health and fitness industry. With headsets commonly used in the gaming industry, earbuds with biometric sensor technology could change the way we play games. Steven LeBoeuf, Valencell’s CEO and cofounder, believes biometrics could be the future of more immersive gaming experiences. Possibilities include fitness games that use your heart rate as a key control measure, action games that require you to physically hold your breath while your character is swimming, and switching between different gaming modes depending on your mood or stress state. “By taking a player on a biometric journey of emotional states via heart-rate variability (HRV) monitoring, a game may teach stress management without making the gamer consciously aware of it,” says LeBeouf.20 “For example, players could tap the mind-body connection to transform themselves from Bruce Banner into The Incredible Hulk simply by changing their emotional state.”

As we can see with all of the examples in this chapter, augmented audio isn’t just about sound, it’s about creating visceral experiences that move us: literally with navigation, play, or fitness; or move us emotionally with the power of empathy, storytelling, and gaming. Each example immerses the user whether it’s through listening or being listened to, resulting in a deeper connection to a place, event, or person through a contextual understanding and personalization. Augmented audio brings our attention to our surroundings, where we can choose to be tuned in or out. The future of sound in AR will not only use audio to supplement visuals to enhance believability, it will be explored in its own right as a method of interaction with its unique characteristics distinct from the other augmented senses.

1 Leigh Alexander, “Dimensions Augments Reality Purely Through Sound,” Gamasutra, November 23, 2011.

2 Janet Cardiff, “Introduction to the Audio Walks.”

3 John Wray, “Janet Cardiff, George Bures Miller and the Power of Sound,” The New York Times, July 26, 2012.

4 Rachel Metz, “First Groupon Founder, Now Tour Guide,” MIT Technology Review, March 6, 2015.

5 Luke Whelan, “This New App Will Change the Way You See Your Neighborhood,” Mother Jones, November 13, 2015.

6 Chris Milk, “How virtual reality can create the ultimate empathy machine,” TED, March 2015.

7 Casey Newton, “Groupon’s ousted founder is making gorgeous audio tours of San Francisco,” The Verge, July 30, 2014.

8 3D SoundScape Demonstrator Video

9 Asha McLean, “Microsoft updates smart headsets for visually impaired,” ZDNet, November 27, 2015.

10 Jennifer Warnick, “Independence Day,” Microsoft Story Labs.

11 Ibid.

12 I was lucky to work with Buxton in 2013 in Toronto on a project called, “Massive Change: The Future of Global Design.” Buxton was our Chief Scientist at Bruce Mau Design, sharing invaluable HCI insights, and even contributing his personal collection of interaction devices spanning a period of 30 years to the project’s exhibition component.

13 Frank Bowe, “Making Computers Accessible to Disabled People.” Technology Review, 90 no. 1 (1987): 52-59.

14 Katie Collins, “Dolby’s stereoscopic virtual reality proves utterly terrifying,” Wired, March 5, 2015.

15 Jörg Müller, Matthias Geier, Christina Dicke, Sascha Spors, “The BoomRoom: Mid-air Direct Interaction with Virtual Sound Sources,” CHI ’14 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014): 247-256.

16 David Barnard et al., iPhone User Interface Design Projects (New York: Apress, 2009), 236.

17 Kickstarter, “Here Active Listening.”

18 Holding the Internet to Ransom, BBC.

19 David Z. Morris, Forget the iWatch. “Forget the iWatch. Headphones are the original wearable tech,” Fortune, June 24, 2014.

20 Steven F. LeBoeuf, “How Biometrics Could Change Gaming in 2014,” Consumer Technology Association, January 14, 2014.

Get Augmented Human now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.