Chapter 9. Person Detection: Building an Application

If you asked people which of their senses has the biggest impact on their day-to-day lives, many would answer vision.1

Vision is a profoundly useful sense. It allows countless natural organisms to navigate their environments, find sources of food, and avoid running into danger. As humans, vision helps us recognize our friends, interpret symbolic information, and understand the world around us—without having to get too close.

Until quite recently, the power of vision was not available to machines. Most of our robots merely poked around the world with touch and proximity sensors, gleaning knowledge of its structure from a series of collisions. At a glance, a person can describe to you the shape, properties, and purpose of an object, without having to interact with it at all. A robot would have no such luck. Visual information was just too messy, unstructured, and difficult to interpret.

With the evolution of convolutional neural networks, it’s become easy to build programs that can see. Inspired by the structure of the mammalian visual cortex, CNNs learn to make sense of our visual world, filtering an overwhelmingly complex input into a map of known patterns and shapes. The precise combination of these pieces can tell us the entities that are present in a given digital image.

Today, vision models are used for many different tasks. Autonomous vehicles use vision to spot hazards on the road. Factory robots use cameras to catch defective ...

Get TinyML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.