Chapter 1. What in the AI? How Did We Get Here?

The idea for this book came to us on a crisp fall day as we pondered on lunch in the hustle-and-bustle streets of New York City. Rob pulled up Foursquare to find a good place to eat (a restaurant recommendation validated by thousands of people we don’t know) and we settled on a spot that was a bit of a walk, tucked away on a city block we’d never heard of before. Paul checked the weather forecast (bespoke to about a mile and updated every 15 minutes) to see if we needed jackets, while Rob pulled up the destination using Google Maps in augmented reality mode. As we trekked the city streets, Paul used the Starbucks app to order teas that were piping hot, ready the moment we arrived at the halfway mark. We had devices that monitored our steps and heart rates, and just before we arrived at the restaurant, Rob Shazam’ed a song we heard blasting out of a café that we thought was cool. And just like that, we generated a heck of a lot of decorated data. By decorated, we mean that it’s not just data, it’s contextualized data. It’s data superimposed on a map of Manhattan, data that can be correlated with other data, data that gives a picture of what we were doing, and what other visitors to New York might want to do, are doing, or have done. Where we walked, what we listened to, what we ate and drank—that’s all decorated data, data in context.

Over lunch we reflected on all that data and declared ourselves digital city cartographers, mapping our lives, events, and interactions with the city in an everlasting digital footprint. And oh, what a digital footprint we made. After all, we generated more data in our hour-long walk than most people in ancient civilizations would have come across in a lifetime!

Toward the end of our lunch, we agreed on this simple fact: today, most organizations’ data acumen isn’t where it needs to be. What’s more, most organizations don’t have an information architecture (IA) to accelerate AI outcomes. You’ll often hear us declare, “You just can’t have AI without an IA!” When it comes to data, it’s simply not a level playing field and this book was written to give you a leg up.

On the walk back to the office, we talked about the struggles we’ve seen organizations face with AI, and the struggles we will likely see in the future (this AI thing is just getting started). We witnessed the Hadoop craze and its once-dominating hype; for the last few years we’ve been hearing about inconsistent business outcomes (as defined by the business) but those same projects are also called successful IT projects (as defined by IT). We know lots of organizations have “big data projects,” but big data without analytics is, well, just a bunch of data. The key takeaway is that data is at your doorstep, but most organizations aren’t ready to welcome it and turn it into insights. The playing field might not be level, due to differences in resources like access to data, funding, and staff, but we think these handicaps are largely self-imposed—and we know how to help organizations remove those handicaps.

With this in mind, we decided that we needed to write a book. This book would give organizations some guidance about how to use their data productively. It also gives organizations a process to get started with AI. That guidance, that process, is what we call the AI Ladder. We introduce the ladder formally in Chapter 4, and we discuss each of the rungs in detail in Chapters 5 through 9.

Here’s a very quick summary of the AI ladder:

Collect data
Find the data that your organization has access to, regardless of where it is or how it is stored. This includes data from external sources and data that’s currently “falling on the floor.”
Organize the data
Data is just a “seething mass of bits” if it isn’t organized. Data needs to be trustworthy if you’re going to have trustworthy results. It needs to be cataloged so others can use it; it needs to be governed, and access needs to be controlled, for regulatory compliance; and it needs to be cleaned so you know it is accurate.
Analyze the data using machine learning
This is the fun part; it’s where you build and deploy AI models developed from your data.
Infuse AI throughout the organization
AI can transform your organization—but it won’t if it’s limited to a few projects in a few departments. The most exciting part of the AI Ladder isn’t the first few successes, it’s finding out how to make your entire organization more effective.

The foundation on which the AI Ladder rests is a modern information architecture (IA): it’s the utopian lift of democratizing AI across an enterprise. We say repeatedly in this book that “there is no AI without IA.” At the same time, if you try to create a modern information architecture, collect your data, and organize your data before you can start any analysis, you’re not likely to get anywhere, at least not this decade. That’s a problem we address specifically. Although it’s a ladder, there are ways to take shortcuts, start with some successful projects, and get on the road to AI without going rung by rung. Indeed, those first successes will help you get the buy-in and support you need for everything else.

Collecting Data in Real Time, but Understanding It in Stale Time

We have yet to meet an organization that has told us it has a serious data collection problem, but we’ve heard from countless organizations that they can’t understand the mountains of data they collect. If you listen closely to most organizations’ data challenges, they will admit they are data-rich and information-poor. In other words, they have a real-time data collection strategy, but they only understand their data in stale time.

Tip

With this in mind, we propose this simple equation as a guiding principle when reading this book:

data collection
data understanding
----------------------------
= the price of not knowing

Take a look at Figure 1-1. A typical organization’s ability to collect data is illustrated by the steeply sloped thick line. Over time, lots and lots of data is collected, and the speed at which data is generated increases, often exponentially. Meanwhile, the organization’s data understanding capabilities grow more slowly, as illustrated by the thinner, flatter line.

Now take all the space between the thick line (data collection) and the thin line (data understanding), and you have what we call the price of not knowing. In this gap, the organization is guilty of not knowing what it could already know (or may have known in the past). The consequence? You name it: money lost, opportunities squandered, vacation flights missed, cars damaged, fraud enabled, lives lost, and more. We sometimes jokingly refer to this as Enterprise Amnesia.

Figure 1-1. Graphical representation of a typical organization’s data collection and data understanding capabilities

While you can argue that organizations simply don’t have the ability to understand the impact of the data coming in (this book will change that), what about the “things” they used to know but have forgotten? That’s context. We’ve all experienced this in our personal lives. For example, when your favorite airline delays your plane for the third time in a month, but the customer service agent who is rebooking your flight has no idea what you’ve been through as they work on your case, that’s a lack of context. Imagine if the agent proactively apologized and upgraded you on your return flight. Here is a missed opportunity for a great client interaction because the airline, and the agent, lacks context. They fail to provide empathy, because they literally have no idea this is the third time their operations have altered your flight plans...but they do know it...but it’s not well known enough for anyone to act on it (the context is forgotten).

Consider all the events any company already knows about, or could know about. Do they apply this knowledge to a 24/7 decisioning environment? For example, an electrical power company knows what a compromised power tower looks like and understands its characteristics: perhaps a blown transformer, rusting bolts that support the infrastructure, or encroaching brush and trees that might catch fire. This same provider has also likely recorded the impact that rainfall and salinity have on its infrastructure. But has it turned that recorded knowledge into simulations, to predict when a tower needs proactive maintenance? Does this company have a static time-based maintenance protocol or is it using conditioned-based monitoring to trigger maintenance routines? Does it use drones with computer vision to rapidly and more safely inspect those power towers? If the answer is “no,” these are signs of Enterprise Amnesia.

If organizations start applying data acumen (a term we’ll use to include AI, machine learning, and deep learning, as well as other approaches we discuss in the book), they can generate a new data collection curve (the dashed line in Figure 1-2). This curve will capture some of the value hidden in the amnesia abyss (the areas between the thick and thin lines) and open up new opportunities for top-line growth, better service, and better outcomes.

Figure 1-2. Organizations that apply data acumen will notice a greater opportunity to correlate data collection and data understanding

Notice that while this new dashed data collection curve is sloped more steeply, it’s not a straight line: the curve has humps, lumps, and bumps. Modernizing your approach to data, and thus generating a new data collection curve, is a highly agile process that will encounter failures, success, and restarts. Culture matters here too (more on that later in this book).

The Modality of Everything and the Data Collection Curve

It’s important to understand that the ways we interact with our environment, both physical and virtual—the “modality of everything”—are changing. These changes set new expectations for the talent you will be recruiting and the way you engage the value chain (from material sourcing through to your customers). Most of all, these changes to how we interact with the world make the data collection curve steeper—and that makes it all the more important for the data understanding curve to keep pace.

We have daughters the same age. Neither of them had any idea what a 3.5-inch disk was when we first showed it to them. The modality of storage has changed so much in the past few decades that all our kids truly know is “the cloud” (if they see a thumb drive in our hands, they remark “OK Boomer…”). The future of storage feels like “anything you want and as much as you want.”

The modality of expressions has changed, too. Expressions that used to be textual are now visual. Our kids don’t communicate via email; they use visual-first expression platforms like TikTok, Instagram, and Snapchat, fully equipped with virtual bunny ears and alien eyes, along with other filters that we “Boomers” (neither of us are Boomers, but it’s Generation Z talk for anyone over 30) see no reason for.

Watch how people interact with their technology today. How many mouse clicks and scroll wheels do you hear in an office these days? The modality of interaction has changed from scrolls and clicks to touchpad swipes and gestures, where the breadth of a pinch or intensity of a touch means something. We call this “digital body language.” The amount of digital body language that companies have collected has exploded in recent years. What hasn’t exploded is the ability to make sense of it and turn it into actionable insights.

Today, we live in a world where everything can be measured. As the Internet of Things becomes the Internet of Everything, edge devices bring more data to an organization’s doorstep than ever before. The ultimate goal would be to morph data from the Internet of Everything into the Intelligence of Everything. But that isn’t going to happen with the current slope of the data understanding curve.

Soon our primary interaction with technology will be through voice. Anyone who uses a voice-driven modality knows we’re not there yet, but that is changing quickly. Newer deep learning techniques will make voice interaction more trustworthy in the coming years, and that means more data. As you can likely deduce, the modality of everything will further steepen the data collection curve.

Even Steeper: The Future of the Data Collection Curve

Here’s the very real and unfortunate-for-many news: the data collection curve is about to get much steeper. Let’s discuss some examples:

Blockchain
First, stop thinking about Bitcoin and cryptocurrency, and force yourself to think of blockchain as a distributed trust protocol with the potential to redefine business models. Blockchains are certainly used in supporting cryptocurrencies, but there are many other applications where trust is costly and needed. From income-share agreements (which we think will disrupt the student loan market) to food supply chains (still struggling with traceability, and now being pressured to move toward transparency), remittance payments, the settling of financial transactions, getting control of the opioid crisis, document exchange, portable medical records, trusting a stranger to drive you somewhere or deliver you what you ordered, mitigating fraud for fast-moving aid payments during a crisis, and everything in between, blockchain technology has the potential to create some of the biggest data sets we’ve ever seen.
Bots and assistants

Sophisticated bots generate enormous amounts of interaction data. That data will enable developers to train algorithms to determine the next best action, analyze tone and intent, cross-sell, up-sell, substitute-sell, spot identity fraud, and more. Those algorithms will also be run on new interactions, and so the “learn-from→apply-to” cycle begins.

Bots are set to redefine baseline business-to-consumer interactions. For example, chatbots allow brands to personalize their marketing. They can sit natively in messaging apps (where people “hang” and interact), they can be iterated and deployed quickly, and more. But one study in particular speaks volumes to us: it shows that using intelligent bots results in a spike in customer engagement. Servion’s study estimates that by 2025, AI-powered bots will sit behind 95% of all customer service interactions. How much will these agents steepen the data collection curve?

Weather

We’re not suggesting that businesses will collect their own weather data, but they will decorate their data with it and apply it to their business to create what we like to call “the moment of Wow!” For example, as 2017’s Hurricane Irma bore down on Florida, Tesla pushed a software update to some of its models, giving extra mileage to help owners get further away from danger and closer to safety. Wow!

Consider how weather relates to insurance. Auto insurance is one heck of a tension-filled business. We all know it well: you pay premiums and have a great record, and then some event happens that isn’t your fault; you fight over the impact and cost of that event and watch your rates climb. You know the irony of this situation? Both parties want the same thing: not to have experienced this event. A single hailstorm in Phoenix once caused $10 million of claim damage. What would happen if your insurance company provided you with an asset registration app that warned you of an incoming weather event, allowed you to temporarily register your asset’s location, and gave you tips and suggestions for avoiding (or minimizing) a claim? This isn’t the stuff of fantasy. One insurance company built such an app, and found that its registered policyholders in a certain region acted on 50% of its alerts. Of those that acted on alerts, only 6.1% of them filed a claim. What’s more, weather events gave this company an average of 10 opportunities to communicate with their clients as “partners,” outside of the regular premium renewal process.

5G
5G cell technology won’t just mean the ability to download a movie in seconds or charge your phone once a month. It will catapult the world into an augmented reality (AR) modality, for everything from insurance inspections to changing the oil in your lawn mower to the experience of buying a boxed item (check out the LEGO store in New York City). This bigger “pipe” means more data. 5G is the battle for data supremacy, and it needs to be considered in any AI or data strategy.
Wearables
Wearables and other smart technologies will bring unfathomable amounts of data to our doorsteps, all increasing the cost of not knowing: imagine toilets that screen urine for diseases, or floors that can not only identify the individual walking on them with more accuracy than a fingerprint, but can also predict hip or knee deterioration. (You don’t need to imagine any of this, it’s here and happening today.)

Where We Are Now—Haystacks, Needles, and More Data

Faced with the challenge of catching up, are you feeling that overwhelming desire to give up right now? Fear not. Everyone has had it (or will get it) before they start climbing the AI Ladder.

Together, we have about 50 years of data experience across thousands of client interactions. We were around when data was about finding a “needle in the haystack” (data warehousing). We saw many declare the end of warehouses and watched organizations go head over heels for Hadoop, expecting to glean unlimited insights from data lakes that morphed into data swamps. Essentially, many thought they could find the needle if they added more hay to the haystack! (Don’t get us wrong—there are successful Hadoop projects, mostly in data preparation and online query archives, but widespread analytical insights never materialized, for reasons outside the scope of this book.) We’ve seen mistakes made, and we’ve made our own mistakes (we have the scar tissue to prove it), but we’ve learned quite a lot in our collective half-century of data strategy observations. That’s where the information architecture we talk about throughout this book comes from.

In today’s reality, we are looking for needles in stacks of needles. That data collection curve is becoming too steep to keep up. We as humans are going to need some help, and this help comes in the form of AI—AI that can watch and observe, feel, listen, understand, annotate, categorize, transcribe, sense, translate, compose, perhaps even smell! This AI isn’t meant to replace humans; it’s meant to help us because we simply can’t keep pace.

AI will change every job out there today, and if you bury your head in the sand on AI, you’re likely to miss out on the data understanding curve altogether. But stop and consider the potential for collaboration between humans and AI. Humans are capable of compassion, intuition, design, value judgment, and common sense (we’re tempted to toss in a joke here...but we did say capable). When we think of computers, we think instant recall, discovery, large-scale math fact checking, immunity to mind-numbing work, never taking a break. We’re good at things computers will never be good at, and computers are good at things we’ll never be good at. Collaboration—joining our strengths—is just common sense. And humans are supposed to be good at that.

There’s no question that Robotic Process Automation (RPA) is coming. It will help to eliminate mundane tasks associated with many business processes: the tasks most employees would gladly relinquish. The goal is for employees to view RPA technologies “teammates” of a sort, willing and able to perform repetitive tasks without complaint, and unmatched in terms of speed and accuracy.

AI will certainly cause some displacement. That’s what happens with any technological revolution. After all, the invention of the automobile put many hostlers (people who take care of horses) out of work. When the Romans invented aqueducts, they probably put thousands of water carriers out of work. How many people wanted to spend their lives carrying water? We firmly believe that most people will end up declaring, “I can’t do my job without this AI technology!”

We are in the early days of a promising new technology, and of the new era to which it is giving birth. This technology is as radically different from the programmable systems that the IT industry has produced for half a century as those systems were from the tabulators that preceded them. World-changing technology carries major implications and responsibilities, but that’s outside the focus of this book (though we do touch on bias and ethics).

How to Displace Today’s Disruptors

Think about the different ways your business can wrangle competitive advantage. Economies of scale? A time-tested classic for sure! If you’re FedEx, P&G, or Walmart, you certainly enjoy cost advantages obtained from scale. These companies are optimized for economies of scale. Because of that, they can innovate and take business from competitors that don’t have the same economies. Now stop to consider how many organizations get to experience this benefit.

Network effects are another form of competitive advantage. Be it Facebook (the world’s largest media producer, yet it produces no content), Alibaba, or some other, these companies have all built business models from network effects. Facebook works because nearly everyone is on Facebook. Let’s face it, most organizations will never see network effects at this scale.

The problem with these two forms of competitive advantage—economies of scale and network effects—is that there are nearly insurmountable barriers to taking advantage of them. They simply aren’t available to everyone! Not every company can be the lowest-cost producer, and not every company can build network effects at scale.

Is there a third way to gain competitive advantage, and if so, what is it? Let’s start with some observations. Uber isn’t really a taxi company, though it is the largest taxi company in the world. Airbnb isn’t an accommodation provider, though it offers the most places to stay in the world. Consider the last decade’s disruptors—we didn’t list them all here, but you know who they are. How did they change things up? There is a new basis for competitive advantage: data. That’s what’s behind Uber, and that’s what’s behind Airbnb, among others.

What else do all of the last decade’s disruptors have in common? They all seized a moment with data. They went to conduct business (sometimes new business) in places they didn’t belong. (This is something you can do with data: for example, Best Buy’s most profitable division is home healthcare.) They created network effects, took business away from incumbents large and small, and created new business models not imaginable a few years ago.

These disruptors know your searching preferences, your viewing preferences, when you post, where you go, who you talk to, how you change your behavior when your phone’s battery drops below 10%, what you buy, and more. They know these things because they understand a lot of the data they collect. Yesterday’s disruptors are pushing even harder to collect more data so they can disrupt more broadly and deeply in this decade than the last.

But here’s the thing about data: most of the data in this world can’t even be “Googled,” which means it’s not readily available to the last decade’s disruptors. That’s why they are offering so much “free” stuff: so they can collect more data about you.

Think about it. A bank can recognize someone who is likely to default on their loan; it has seen hundreds of thousands of those cases. An experienced insurance adjuster knows the cost of damages associated with a low-speed collision just by looking at it; they have adjudicated thousands of car claims across all levels of damage. AI is an excellent opportunity for companies to capture this kind of knowledge, which is especially important as the most experienced part of many organizations’ workforce moves into retirement.

We’ve mentioned the lack of a level playing field. Data is the one area where every company has an equal opportunity to be great. Knowing your own data is a competitive advantage available to everyone. Climbing to the top of the AI Ladder and understanding your data will be your competitive differentiator. But you need the acumen (and a plan to put to good use the superpowers that come with it) to become the disruptor and not the disruptee.

You may have noticed that we use the word “acumen” a lot when we talk about data. We want you to think of data in terms of your acumen. This is because even if you have lots of data (hint: you do), it’s not much use to you unless you know how to put it to work (that’s the acumen part). That’s exactly what this book is here to help you do. Be forewarned: the landscape of data acumen is ever changing. We like to tell people to think of their data and data acumen like a gym membership, for two reasons. First, if you don’t use it, you’ll get nothing out of it. Second, if you stop using it, you start to lose whatever gains you built while using it.

Let’s Get Ready for a Climb!

We wrote this book not just to advance your AI business skills, but also to give you acumen on how to start and complete effective AI projects.

AI might seem overhyped right now. People are going to overrepresent AI’s potential impact on the next year or so. But we’re certain they will underestimate its effect on the next 5 or 10. AI is here to stay.

The challenges and hype aside, we’re at a key inflection point as a society. Make no mistake about this. Our world is rapidly moving from one where most processes are curated and run by humans supported by technology to one where processes are run by AI technologies supported by humans.

Apple aficionados may recall Steve Jobs’s famous quote: to him, a computer is “a bicycle for our minds.” During the brief time Jobs spent in college, he was fascinated by two things: fonts (which is why Apple is so focused on beautiful design and emotional attachment) and the locomotive efficiency of living organisms. As he learned, humans are about a third of the way down the list in terms of the most mechanically efficient moving creatures. Number one is the condor (it caught us by surprise too). However, when a human is on a bike, their locomotive efficiency blows the condor off the charts. It’s like a superpower.

If a computer is a bicycle for our minds, then AI is a bicycle for analytics. It’s a solution to the quandary we’re living in: we store everything but have no way to apply intelligence to it all. The analytics world has always been about programming for insights. Computers are programmed with rules and instructions to perform various tasks very quickly. But the idea behind AI is that it learns on its own, through observations. After all, computers are excellent at finding patterns, so let them do that.

AI (including subdisciplines like machine learning and deep learning) will do for the 21st century what the industrial revolution did for the 18th century. As we said, we are at an inflection point. Data acumen will give some companies (and individuals) a lift and shift, while it will hand others a rift and cliff.

Today, different companies are at different stages in their AI journey. Despite all the coverage and claims, most companies are just starting the journey. Companies still brag about their customer service, their merchandising strategies, their mass marketing campaigns, and more. Underpinning those business “brags” are perhaps some algorithms here and there. As these companies venture further into their journeys, they will privately (and perhaps publicly) brag about how they use AI. But the companies that are really transformed by AI won’t be limited to a few algorithms assisting the business strategy and value propositions. They will have thousands of algorithms. And they won’t be updated monthly, quarterly, or yearly, they will be automatically updated as needed: daily, hourly, and perhaps even faster. There won’t be algorithms designed for cohorts and segments, but a plethora of hyperpersonalized ones for individuals. You’ll often hear us say “death of the average and personalized to the one.”

We’ll make a bold declaration: in the years ahead, we’ll stop talking about artificial intelligence; instead, we’ll talk about ambient intelligence. Why ambient? The definition of the word includes phrases such as “existing or present on all sides” or “an encompassing atmosphere.” Much like unobtrusive lighting or background music, it’s there but it fades into the unnoticed as you go about your day. AI will become so intertwined in our day-to-day personal and professional routines, it will become ambient intelligence.

As a final suggestion before you start reading this book, spend a moment or two asking yourself, “What is the cost of my company’s not knowing?” Now get on your bike and let’s start the ride.

Get The AI Ladder now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.