DESIGNING INTERACTIVE GESTURES: THE BASICS

The design of any product or service should start with the needs of those who will use it, tempered by the constraints of the environment, technology, resources, and organizational goals, such as business objectives. The needs of users can range from simple (I want to turn on a light) to very complex (I want to fall in love). (Most human experience lies between those two poles, I think.) However natural, interesting, amusing, novel, or innovative an interactive gesture is, if the users' needs aren't met, the design is a failure.

The first question that anyone designing a gestural interface should ask is: should this even be a gestural interface? Simply because we can now do interactive gestures doesn't mean they are appropriate for every situation. As Bill Buxton notes,[14] when it comes to technology, everything is best for something and worse for something else, and interactive gestures are no exception.

There are several reasons to not have a gestural interface:

Heavy data input

Although some users adapt to touchscreen keyboards easily, a keyboard is decidedly faster for most people to use when they are entering text or numbers.

Reliance on the visual

Many gestural interfaces use visual feedback alone to indicate that an action has taken place (such as a button being pressed). In addition, most touchscreens and many gestural systems in general rely entirely on visual displays with little to no haptic affordances or feedback. There is often no physical feeling that a button has been pressed, for instance. If your users are visually impaired (as most adults over a certain age are) a gestural interface may not be appropriate.

Reliance on the physical

Likewise, gestural interfaces can be more physically demanding than a keyboard/screen. The broader and more physical the gesture is (such as a kick, for instance), the more likely that some people won't be able to perform the gesture due to age, infirmity, or simply environmental conditions; pressing touchscreen buttons in winter gloves is difficult, for instance. The inverse is also true: the subtler and smaller the movement, the less likely everyone will be able to perform it. The keyboard on the iPhone, for instance, is entirely too small and delicate to be used by anyone whose fingers are large or otherwise not nimble.

Inappropriate for context

The environment can be nonconducive to a gestural interface in any number of situations, either due to privacy reasons or simply to avoid embarrassing the system's users. Designers need to take into account the probable environment of use and determine what, if any, kind of gesture will work in that environment.

There are, of course, many reasons to use a gestural interface. Everything that a noninteractive gesture can be used for—communication, manipulating objects, using a tool, making music, and so on—can also be done using an interactive gesture. Gestural interfaces are particularly good for:

More natural interactions

Human beings are physical creatures; we like to interact directly with objects. We're simply wired this way. Interactive gestures allow users to interact naturally with digital objects in a physical way, like we do with physical objects.

Less cumbersome or visible hardware

With many gestural systems, the usual hardware of a keyboard and a mouse isn't necessary: a touchscreen or other sensors allow users to perform actions without this hardware. This benefit allows for gestural interfaces to be put in places where a traditional computer configuration would be impractical or out of place, such as in retail stores, museums, airports, and other public spaces.

New York City in late 2006 installed touchscreens in the back seats of taxicabs. Although clunky, they allow for the display of interactive maps and contextual information that passengers might find useful, such as a Zagat restaurant guide. Courtesy New York City Taxi and Limousine Commission.

Figure 1-15. New York City in late 2006 installed touchscreens in the back seats of taxicabs. Although clunky, they allow for the display of interactive maps and contextual information that passengers might find useful, such as a Zagat restaurant guide. Courtesy New York City Taxi and Limousine Commission.

More flexibility

As opposed to fixed, physical buttons, a touchscreen, like all digital displays, can change at will, allowing for many different configurations depending on functionality requirements. Thus, a very small screen (such as those on most consumer electronics devices or appliances) can change buttons as needed. This can have usability issues (see later in this chapter), but the ability to have many controls in a small space can be a huge asset for designers. And with nontouchscreen gestures, the sky is the limit, space-wise. One small sensor, which can be nearly invisible, can detect enough input to control the system. No physical controls or even a screen are required.

More nuance

Keyboards, mice, trackballs, styli, and other input devices, although excellent for many situations, are simply not as able to convey as much subtlety as the human body. A raised eyebrow, a wagging finger, or crossed arms can deliver a wealth of meaning in addition to controlling a tool. Gestural systems have not begun to completely tap the wide emotional palette of humans that they can, and likely will, eventually exploit.

More fun

You can design a game in which users press a button and an on-screen avatar swings a tennis racket. But it is simply more entertaining—for both players and observers—to mimic swinging a tennis racket physically and see the action mirrored on-screen. Gestural systems encourage play and exploration of a system by providing a more hands-on (sometimes literally hands-on) experience.

Once the decision has been made to have a gestural interface, the next question to answer is what kind of gestural interface it will be: direct, indirect, or hybrid. As I write this, particularly with devices and appliances, the answer will be fairly easy: direct-manipulation touchscreen is the most frequently employed gestural interface currently. In the future, as an increasing variety of sensors are built into devices and environments, this may change, but for now touchscreens are the new standard for gestural interfaces.

THE CHARACTERISTICS OF GOOD GESTURAL INTERFACES

Although particular aspects of gestural systems require more and different kinds of consideration, the characteristics of a good gestural interface don't differ much from the characteristics of any other well-designed interactive system.[15] Designers often use Liz Sanders' phrase "useful, usable, and desirable"[16] to describe well-designed products, or they say that products should be "intuitive" or "innovative." All of that really means gestural interfaces should be:

Discoverable

Being discoverable can be a major issue for gestural interfaces. How can you tell whether a screen is touchable? How can you tell whether an environment is interactive? Before we can interact with a gestural system, we have to know one is there and how to begin to interact with it, which is where affordances come into play. An affordance is one or multiple properties of an object that give some indication of how to interact with that object or a feature on that object. A button, because of how it moves, has an affordance of pushing. Appearance and texture are the major sources of what psychologist James Gibson called affordances,[17] popularized in the design community by Don Norman in his seminal 1988 book The Psychology of Everyday Things (later renamed The Design of Everyday Things).

Without the tiny diagrams on the dispenser, there would be no affordances to let you know how to get the toilet paper out. Gestural interfaces need to be discoverable so that they can be used. Courtesy Yu Wei Products Company.

Figure 1-16. Without the tiny diagrams on the dispenser, there would be no affordances to let you know how to get the toilet paper out. Gestural interfaces need to be discoverable so that they can be used. Courtesy Yu Wei Products Company.

Trustworthy

Unless they are desperate, before users will engage with a device, the interface needs to look as though it isn't going to steal their money, misuse their personal data, or break down. Gestural interfaces have to appear competent and safe, and they must respect users' privacy (see THE ETHICS OF GESTURES in Chapter 8). Users are also now suspicious of gestural interfaces and often an attraction affordance needs to be employed (see Chapter 7).

Responsive

We're used to instant reaction to physical manipulation of objects. After all, we're usually touching things that don't have a microprocessor and sensor that need to figure out what's going on. Thus, responsiveness is incredibly important. When engaged with a gestural interface, users want to know that the system has heard and understood any commands given to it. This is where feedback comes in. Every action by a human directed toward a gestural interface, no matter how slight, should be accompanied by some acknowledgment of the action whenever possible and as rapidly as possible (100 ms or less is ideal as it will feel instantaneous). This can be tricky, as the responsiveness of the system is tied directly to the responsiveness of the system's sensors, and sensors that are too responsive can be even more irksome than those that are dull. Imagine if The Clapper picked up every slight sound and turned the lights on and off, on and off, over and over again! But not having near-immediate feedback can cause errors, some of them potentially serious. Without any response, users will often repeat an action they just performed, such as pushing a button again. Obviously, this can cause problems, such as accidentally buying an item twice or, if the button was connected to dangerous machinery, injury or death. If a response to an action is going to take significant time (more than one second), feedback is required that lets the user know the system has heard the request and is doing something about it. Progress bars are an excellent example of responsive feedback: they don't decrease waiting time, but they make it seem as though they do. They're responsive.

Appropriate

Gestural systems need to be appropriate to the culture, situation, and context they are in. Certain gestures are offensive in certain cultures. An "okay" gesture, commonplace in North America and Western Europe, is insulting in Greece, Turkey, the Middle East, and Russia, for instance.[18] An overly complicated gestural system that involves waving arms and dancing aroundin a public place is not likely to be an appropriate system unless it is in a nightclub or other performance space.

Meaningful

The coolest interactive gesture in the world is empty unless it has meaning for the person performing it; which is to say, unless the gestural system meets the needs of those who use it, it is not a good system.

Smart

The devices we use have to do for us the things that we as humans have trouble doing—rapid computation, having infallible memories, detecting complicated patterns, and so forth. They need to remember the things we don't remember and do the work we can't easily do alone. They have to be smart.

Clever

Likewise, the best products predict the needs of their users and then fulfill those needs in unexpectedly pleasing ways. Adaptive targets are one way to do this with gestural interfaces. Another way to be clever is through interactive gestures that match well the action the user is trying to perform.

Playful

One area in which interactive gestures excel is being playful. Through play, users will not only start to engage with your interface—by trying it out to see how it works—but they will also explore new features and variations on their gestures. Users need to feel relaxed to engage in play. Errors need to be difficult to make so that there is no need to put warning messages all over the interface. The ability to undo mistakes is also crucial for fostering the environment for play. Play stops if users feel trapped, powerless, or lost.

Pleasurable

"Have nothing in your house," said William Morris, "that you do not know to be useful, or believe to be beautiful." Gestural interfaces should be both aesthetically and functionally pleasing. Humans are more forgiving of mistakes in beautiful things.[19] The parts of the gestural system—the visual interface; the input devices; the visual, aural, and haptic feedback—should be agreeable to the senses. They should be pleasurable to use. This engenders good feelings in their users.

Good

Gestural interfaces should have respect and compassion for those who will use them. It is very easy to remove human dignity with interactive gestures—for instance, by making people perform a gesture that makes them appear foolish in public, or by making it so difficult to perform a gesture that only the young and healthy can ever perform it. Designers and developers need to be responsible for the choices they make in their designs and ask themselves whether it is good for users, good for those indirectly affected, good for the culture, and good for the environment. The choices that are made with gestural interfaces need to be deliberate and forward-thinking. Every time users perform an interactive gesture, in an indirect way they are placing their trust in those who created it to have done their job ethically.

THE ATTRIBUTES OF GESTURES

Although touchscreen gestural interfaces differ slightly from free-form gestural interfaces, most gestures have similar characteristics that can be detected and thus designed for. The more sophisticated the interface (and the more sensors it employs), the more of these attributes can be engaged:

Presence

This is the most basic of all attributes. Something must be present to make a gesture in order to trigger an interaction. For some systems, especially in environments, a human being simply being present is enough to cause a reaction. For the simplest of touchscreens, the presence of a fingertip creates a touch event.

Duration

All gestures take place over time and can be done quickly or slowly. Is the user tapping a button or holding it down for a long period? Flicking the screen or sliding along it? For some interfaces, especially those that are simple, duration is less important. Interfaces using proximity sensors, for instance, care little for duration and only whether a human being is in the area. But for games and other types of interfaces, the ability to determine duration is crucial. Duration is measured by calculating the time of first impact or sensed movement compared to the end of the gesture.

Position

Where is the gesture being made? From a development standpoint, position is often determined by establishing an x/ylocation on an axis (such as the entire screen) and then calculating any changes. Some gestures also employ the z-axis of depth. Note that because of human beings' varying heights, position can be relational (related to the relative size of the person) or exact (adjusted to the parameters of the room). For instance, a designer may want to put some gestures high in an environment so that children cannot engage in them.

Motion

Is the user moving from position to position or striking a pose in one place? Is the motion fast or slow? Up and down, or side to side? For some systems, any motion is enough to trigger a response; position is unnecessary to determine.

Pressure

Is the user pressing hard or gently on a touchscreen or pressure-sensitive device? This too has a wide range of sensitivity. You may want every slight touch to register, or only the firmest, or only an adult weight (or only that of a child or pet). Note that some pressure can be "faked" by duration; the longer the press/movement, the more "pressure" it has. Pressure can also be faked by trying to detect an increasing spread of a finger pad: as we press down, the pad of our finger widens slightly as it presses against a surface.

Size

Width and height can also be combined to measure size. For example, touchscreens can determine whether a user is employing a stylus or a finger based on size (the tip of a stylus will be finer) and adjust themselves accordingly.

Orientation

What direction is the user (or the device) facing while the gesture is being made? For games and environments, this attribute is extremely important. Orientation has to be determined using fixed points (such as the angle of the user to the object itself).

Including objects

Some gestural interfaces allow users to employ physical objects alongside their bodies to enhance or engage the system. Simple systems will treat these other objects as an extension of the human body, but more sophisticated ones will recognize objects and allow users to employ them in context.

For instance, a system could see a piece of paper a user is holding as being simply part of the user's hand, whereas another system, such as the Digital Desk system (see Figure 1-12, earlier in this chapter), might see it as a piece of paper that can have text or images projected onto it.

Number of touch points/combination

More and more gestural interfaces have multitouch capability, allowing users to use more than one finger or hand simultaneously to control them. They may also allow combinations of gestures to occur at the same time. One common example is using two hands to enlarge an image by dragging on two opposite corners, seemingly stretching the image.

Designers experimenting with a multitouch system to play Starcraft with two hands. Courtesy Harry van der Veen and Natural User Interface.

Figure 1-17. Designers experimenting with a multitouch system to play Starcraft with two hands. Courtesy Harry van der Veen and Natural User Interface.

Sequence

Interactive gestures don't necessarily have to be singular. A wave followed by a fist can trigger a different action than both of those gestures done separately. Of course, this means a very sophisticated system that remembers states. This is also more difficult for users (see STATES AND MODES, later in this chapter).

Number of participants

It can be worthwhile with some devices—such as Microsoft's Surface, which is meant to be used socially for activities such as gaming or collaborative work—to detect multiple users. Two people operating a system using one hand each is very different from one person operating a system using both hands.

When designing a particular interactive gesture, these attributes, plus the range of physical movement (see Chapter 2), should be considered. Of course, simple gestural interfaces, such as most touchscreens, will use only one or two of these characteristics (presence and duration being the most common), and designers and developers may not need to dwell overly long on the attributes of the gesture but instead on the ergonomics and usability of interactive gestures (see Chapter 2).

INTERFACE CONVENTIONS

Many of the traditional interface conventions work well in gestural interfaces: selecting, drag-and-drop, scrolling, and so on. There are several notable exceptions to this:

Cursors

With gestural interfaces, a cursor is often unnecessary since a user isn't consistently pointing to something; likewise, a user's fingers rarely trail over the touchscreen where a cursor would be useful to indicate position. Users don't often lose track of their fingers! Of course, for gaming, a cursor is often absolutely essential to play, but this is usually on free-form gestural interfaces, not touchscreens.

Hovers and mouse-over events

For the same reason that cursors aren't often employed, hovers and mouse-over events are also seldom used, except in some free-form games and in certain capacitive systems. Nintendo's Wii, for instance, often includes a slight haptic buzz as the user roles over selectable items. Some sensitive capacitive touchscreens can detect a hand hovering over the screen, but hovers need to be aware of screen coverage (see Chapter 2).

Double-click

Although a double click can be done with a gestural interface, it should be used with caution. A threshold has to be set (e.g., 200 ms) during which two touch events in the same location are counted as a double click. The touchscreen has to be sensitive and responsive enough to register touch-rest-touch. Single taps to click are safer to use (see TAP TO OPEN/ACTIVATE in Chapter 3).

Right-click

Most gestural interfaces don't have the ability to bring up an alternative menu for objects. The direct-manipulation nature of most gestural interfaces tends to go against this philosophically. This is not to say that digital objects could not display a menu when selected, just that they frequently avoid this traditional paradigm.

Drop-down menus

These generally don't work very well for the same reasons as right-click menus, combined with the limitations of hover.

Cut-and-paste

As of this writing, cut-and-paste is only partially implemented or theorized on most gestural interfaces. It will likely be implemented shortly, but as of summer 2008, it has not been on most common gestural interfaces.

Multiselect

As humans, we're limited by the number of limbs and fingers we have to select multiple items on a screen or a list. There are ways around this, such as a select mode that could be turned on so that everything on-screen, once selected, remains selected; alternately, an area could be "drawn" that selects multiple items.

Selected default buttons

Since pressing a return key (and thus pushing a selected button) isn't typically part of a gestural system, all a selected default button can do is highlight probable behavior. Users will have to make an interactive gesture (e.g., pushing a button) no matter what to trigger an action.

Undo

It's hard to undo a gesture; once a gesture is done, typically the system has executed the command, and there is no obvious way, especially in environments, to undo that action. It is better to design an easy way to cancel or otherwise directly undo an action (e.g., dragging a moved item back) than it is to rely on undo.

Assuredly, there are exceptions to all of these interface constraints, and clever designers and developers will find ways to work around them. For the new conventions that have been established with gestural interfaces, see Chapter 3 and Chapter 4.

STATES AND MODES

Most gestural interfaces are stateless or modeless, which is to say that there is only one major function or task path for the system to accomplish at any given time. An airport kiosk assists users in checking in, for instance. The Clapper turns lights on or off. Users don't switch between "modes" (as, say, between writing and editing modes), or if they do, they do so only in complex devices such as mobile phones.

The reason for this is both contextual and related to the nature of interactive gestures. Gestural interfaces are often found in public spaces, where attention is limited and simplicity and straightforwardness are appreciated; this is combined with the fact that—especially with free-form interfaces—there might be no visual indicator (i.e., no screen or display) to convey what mode the user is in. Thus, doing a gesture to change to another mode may accomplish the task, but how does the user know it was accomplished? And how does the user return to the previous mode? By performing the same gesture? Switching between states is a difficult interaction design problem for gestural interfaces.

Thus, it is considerably easier for users (although not for the designers) to have either clear paths through the gestural system or a single set of choices to execute. For example, a retail kiosk might be designed to help with the following tasks: searching for an item, finding an item in the store, and purchasing an item. It is better to have these activities in a clear path (search to find to buy) than to require users to switch to different modes to execute an action, such as buying.

As users become more sophisticated and gestural interfaces more ubiquitous, this may change, but for now, a stateless design is usually the better design.



[14] See Bill Buxton's multitouch overview at http://www.billbuxton.com/multitouchOverview.html.

[15] For a longer discussion, see Designing for Interaction by Dan Saffer (Peachpit Press): 60–68.

[16] See "Converging Perspectives: Product Development Research for the 1990s," by Liz Sanders, in Design Management Journal, 1992.

[17] Gibson, J.J. "The theory of affordances," in Perceiving, Acting, and Knowing: Toward an Ecological Psychology, R. Shaw and J. Bransford (Eds.) (Lawrence Erlbaum): 67–82.

[18] See Field Guide to Gestures, by Nancy Armstrong and Melissa Wagner (Quirk Books): 45–48.

[19] See Don Norman's book, Emotional Design (Basic Books), for a detailed discussion of this topic.

Get Designing Gestural Interfaces now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.