Chapter 1. The AI Organization Defined

Thirty years ago, every business was looking at software as a way to redefine how it ran its operations. New systems of record were able to manage every core process in the enterprise, from accounting to payroll, resource planning, and customer management. This change was the foundation of the digital transformation—but as big a change as it was, the digitization of the core processes didn’t alter the primary business of a company; it just made it more efficient.

In the past decade, however, systems of record have been extended with so-called systems of engagement. These systems redefine how companies engage with customers, how customers use and buy their products, and even what those products are. Along the way, software evolved from being focused on efficiencies to being a core aspect of business. Actually, it quickly became the part of business where the differentiation battle was fought. Companies using software cleverly were able to differentiate very quickly from more traditional companies: think Netflix, Airbnb, Uber, and Amazon. Even referring to these as media, real estate, transportation, and retail companies sounds weird; they are, in a sense, software companies, because they understand that software is a primary function inside the organization.

Modern companies look at software as something that is infused through every aspect of their business, a critical component of their operational efficiencies as well as their products and business models. Nowadays, every company is a software company.

The Emergence of AI

The full digitalization of companies has also had a secondary effect—the proliferation of data. Systems of record transformed what traditionally were paper files into digital stores containing all the business data behind the company’s core processes. Systems of engagement added a vast amount of data about product usage and customer interactions. This created the perfect environment for the eruption of systems of intelligence.

Systems of intelligence leverage the vast amount of data generated in an enterprise to create expert systems. These systems are able to provide insights, optimize, and even predict future outcomes to help the business make better decisions. In the past decade, organizations all over the world began using techniques such as reporting, analytics, and data mining to create these systems of intelligence—but nobody was calling them artificial intelligence yet.

Traditionally the term artificial intelligence has been reserved for describing very special occasions when a machine has been able to perform tasks that are normally associated with human intelligence. As powerful as reporting or analytics can be, they’re definitely not at that level. So, what happened? Why is everybody using the term AI now?

The primary reason for this change is the growing sophistication of the techniques available to build these systems, in particular in the area of machine learning. Machine learning techniques have been used in the past to build systems of intelligence. They can build a mathematical model based on sample data (also known as training data), which can make predictions without being explicitly programmed. This difference is illustrated in Figure 1-1.

Programming versus machine learning
Figure 1-1. Programming versus machine learning

There are multiple algorithms that can be used for machine learning, but there’s one in particular that is behind the explosion of use of the term AI: artificial neural networks.

Artificial neural networks are loosely based on how real neurons work in our brains, as shown in Figure 1-2. Each neuron is a simple activation function that is linked with other neurons via weighted connections. In recent years we’ve seen a huge amount of innovation related to this technique, with researchers creating more and more complex architectures of artificial neurons. This approach, with networks consisting of thousands of neurons and dozens of layers, is also referred to as deep learning.

A simple artificial neural network
Figure 1-2. A simple artificial neural network

Deep learning is behind most of the AI breakthroughs reported increasingly frequently in the press. Using this technique, machines can now recognize objects in images, transcribe speech, answer questions about a text, or translate text from one language to another—all at a better rate of accuracy than a human can achieve, deserving without question the categorization of artificial intelligence. You will learn more about artificial neural networks and even create one yourself in Appendix A in this book.  

AI Capabilities

Human intelligence is broad and complex. Some human achievements are definitely not in the realm of what machines can do now. It may be a long time before they get there, if they ever do. Abstract problem solving, concept generalization, emotional knowledge, creativity, and even self-awareness are all areas where even the most powerful deep learning algorithms cannot come close to human intelligence. The combination of all these cognitive abilities in a machine that can be generalized to any scenario is referred to as artificial general intelligence; for now, it’s just a theoretical exercise.

However, current techniques are showing great success at performing narrower tasks traditionally reserved to human intelligence. We call this narrow AI or weak AI, and it refers primarily to three capabilities: learning, perception, and cognition.

Learning

The primary characteristic of machine learning is the ability to learn over time, without the need of explicit programming. Machine learning algorithms learn by exploring and doing, just like humans (though I’m sure those of you who are parents of young kids sometimes wish that wasn’t the case!), and not by following a set of step-of-step instructions.

Machine learning algorithms are categorized depending on how they perform that learning. The most popular technique, and the one you will probably use 90% of the time in your enterprise, is supervised learning.

Supervised learning uses a set of data that contains both the input and the desired output. Through iterative optimization, the learning algorithm will find a function that can model how the inputs are transformed into the outputs. This model can then be applied to new inputs that are not part of the training set, predicting the associated outputs.

Finding the right algorithm and its parameters is part science, part creativity and gut feeling. How to apply machine learning to that process itself is a research topic of its own—the technique is called automated machine learning, or AutoML. If you want to experiment with machine learning and learn more about it, check out the “AI crash course” in Appendix A.

Supervised algorithms all have the same flaw: they require a lot of data. And not just any kind of data; they require training data that includes both the inputs and the associated outputs, also referred to as labeled data.

Sometimes we will have historical data stored by our systems of record or systems of engagement that is already labeled. Think, for example, of a customer churn model—we can look at our historical data on customers who have churned and use that as the output for the training data in addition to the customers’ history of interactions. Using the right algorithm, we will be able to predict customer churn in the future just by looking at a new set of interactions.

Sometimes, however, we won’t be that lucky, and the data won’t be labeled. Unsupervised algorithms take  a set of unlabeled data and find structure in it. Clustering is the most popular type of unsupervised algorithm: it uses different techniques to find groups in data based on commonalities. You may use these algorithms for identifying customer segments in your customer base or among your website visitors. Other commonly used unsupervised techniques are association rules (which can identify associations in the data, such as people who bought a particular product being interested in others) and anomaly detection (finding rare or suspicious elements that differ from the majority of data).

In other cases, we don’t use any training data at all. Think of how humans learn to play a video game. A supervised approach to this problem would be to watch thousands of games to learn from them. That’s the business model for many YouTubers that my kids watch, but I find that approach tremendously boring. A more interesting way to learn is to actually play the game. As we play, we get positive reinforcement when we do something well (e.g., we get points) and negative reinforcement when we do something wrong (e.g., we get killed). Reinforcement learning algorithms do exactly that: they learn the machine learning function by exploring the environment and reinforcing the right behavior.

Reinforcement learning is an amazingly promising area of machine learning in business because of its data-less nature. It is especially suited to autonomous systems—both mobile, such as cars or drones, and static, such as HVAC or power systems—but it can also be used for complex business processes. Reinforcement learning is usually identified as the most difficult discipline in AI, but the crash course in Appendix A will teach you the basic concepts and even how to create your first reinforcement learning algorithm.

Perception

If there’s one area that has traditionally been exclusive to humans, it is perception. For decades we’ve been trying to mimic humankind’s ability to perceive the world around us, with limited success. The complexity of understanding an image or converting speech to text made it extremely difficult for this to be done programmatically—just imagine defining the step-by-step instructions required to identify a horse in a picture!

Machine learning algorithms are a much better fit for this kind of problem. However, the accuracy of traditional machine learning algorithms when applied to perception tasks hasn’t even come close to what a human can achieve. (I still remember demoing the speech recognition feature in Windows Vista for developers…it definitely made me a stronger person!)

Take image classification, for example. The ImageNet challenge is the most popular challenge for image classification. Since 2010, participants all over the world have submitted their algorithms in a race to build the most accurate model. At the beginning of the competition, in 2010, a good error rate was around 25%. For comparison, the human equivalent error rate on the same dataset is around 5.1%. In 2012 Alex Krizhevsky, a student at the University of Toronto, submitted as his solution a neural network consisting of eight layers called AlexNet. It crushed the competition, achieving an error rate of 15.3%—10 points lower than the next contender. During the following years the technique he introduced was improved and more layers were added, with GoogLeNet, a 22-layer neural network, achieving an error rate of 6.7% in 2014.  The following year, a team at Microsoft Research submitted an entry that used a new neural network technique: its residual neural net, which had a depth of a whopping 152 layers, achieved an error rate of only 3.57%, surpassing human performance for the first time.

Deep learning changed computer vision forever. Today, this technique is used for virtually every scenario in computer vision with high accuracy, making it one of the most popular use cases in the enterprise. Here are some examples of tasks that computer vision is used for today:

  • Classify the content of an image (image classification)

  • Recognize multiple objects in an image, and identify the bounds for each (object detection)

  • Recognize scenes or activities in images (e.g., unsafe situations in the workplace, or restocking needs in retail stores)

  • Detect faces, recognize them, and even identify emotions for each

  • Recognize written text, including handwritten text (optical character recognition)

  • Identify offensive content in images and videos

In their book Telling Ain’t Training (ASTD Press), researchers Harold Stolovitch and Erica Keeps assert that 83% of the information we receive comes from our sense of sight. Hearing is next, providing 11% of our sensory input; together, the two account for 94% of all the information we receive from the external world. There’s no doubt that audio processing is the other big area of focus for AI, right after computer vision. 

Similar deep learning techniques can be applied to audio signals, helping computers identify sounds. You can use this ability to identify birds by their songs, or predict failures in wind turbines by the sounds they make.

But the most exciting use of AI in audio processing is definitely speech recognition. The reference dataset used for speech recognition is called Switchboard: it contains approximately 260 hours of two-sided telephone conversations. The measured transcription error rate for humans is 5.9%; it was equaled by a neural network designed by Microsoft Research in 2016 and beaten a year later with a 5.1% error rate. For the first time, a machine was able to understand humans better than humans themselves.

These breakthroughs are not only enabling machines to understand us, but also to communicate back to us in natural ways. In 2018 the text-to-speech service available in Azure, also developed with deep learning techniques, was able to synthesize human voice at a level of quality virtually undistinguishable from a real one.

The conjunction of these capabilities can also enable the holy grail of computer science: fully natural user interfaces (NUIs). With machines that can not only see and understand humans, but can communicate back to us using natural speech, it seems we’ve accomplished the dream of every sci-fi movie ever. Have we really, though? To truly have a meaningful interaction with a computer, it should not only be able to transcribe what we say, but also understand the meaning of speech.

Natural language processing (NLP) is the field of AI that analyzes, understands, and derives meaning from human language. One of the most common scenarios for NLP is language understanding, the foundation for modern conversational AI experiences such as digital assistants. When you ask Siri, Alexa, or Cortana about the weather, the system first transforms your speech audio into text, then applies a natural language understanding model to extract your intent. The intent (e.g., “get weather”) is then mapped to an output (in this case, providing information on the local weather).

NLP techniques have exploded in the past few years. Some of them are useful only for basic tasks such as sentiment analysis, keyword extraction, or entity identification, but others can be used for more complex tasks like text summarization or translation. In 2018, the machine translation team at Microsoft was able to achieve human parity on automatic translation—an extremely complex task, and a goal previously considered unachievable—for the first time.

One of the most exciting uses of natural language understanding is machine reading comprehension. In January 2018, a team at Microsoft Research Asia was able to achieve human parity using the Stanford Question Answering Dataset (SQuAD), a machine reading dataset that is made up of questions about a set of Wikipedia articles. In fact, the system was able to perform better than a human at providing answers to open questions related to those articles. Many companies have continued to contribute to this challenge, taking it even further.

Still, these systems don’t use the same level of abstraction as humans. At its core, a question-answering algorithm will search the text for clues that can point to the right answer. For every question, the system will search the entire text for a match. Humans do that too (especially if we are in a hurry), but when we truly want to understand a piece of text we extract knowledge from it, to generalize it and make it more consumable.

Imagine a text describing the state of California. Humans would generalize the entity “California” from that text and add attributes to it (e.g., population, size), and even relationships with other entities (e.g., neighbor states, governor). After that generalization, we don’t need the text any more to answer questions about California; we have generalized the knowledge about it.

The equivalent in artificial intelligence for this process is called knowledge extraction, and it has profound implications in the enterprise. Using these techniques, we can extract high-level concepts from chaotic, unstructured, and even confusing information. The resulting knowledge graph can be used not only to answer broad-ranging questions across our entire data estate, but also to navigate and understand that information.

That level of abstraction goes way beyond the traditional capabilities of NLP, taking it closer to what we know as cognition.

Cognition

Strictly speaking, cognition is the ability to acquire and process knowledge. It involves high-level constructs that our minds use for reasoning, understanding, problem solving, planning, and decision making.

The techniques we have explored so far involve some level of cognition, although it is not always apparent. Take image classification as an example. If we closely examine a deep neural network used for image classification, we can actually see how the neural network is decomposing the problem into smaller steps in every layer. Without human intervention, the neural network automatically shows some level of generalization: the first layers detect simple characteristics such as edges or textures. As we get deeper into the neural network, layers are able to extract more complex features such as patterns or elements. In a sense, the neural network has been able to acquire some knowledge and do some basic reasoning with it.

Natural language processing shows similar intrinsic abstractions. At their core, most modern NLP techniques use a concept called word embedding. With word embedding, every word in a text is transformed into a vector that represents the meaning of the word. In this new space, words that are semantically similar (for example, “weather” and “forecast”) are closer to each other. Using that approach, the system will match the sentences “What’s the weather today?” and “Get the forecast for the next 24 hours” to the same intent. Even if the words are different, their embeddings are similar because they are semantically close to each other. Translation works the same way: translation techniques use word embeddings to abstract the input text and turn it into a language-independent “idea” that can then be translated into any language using the reverse process.

In all these cases, cognition is intrinsic to the perception. However, many AI scenarios are purely cognitive. They are not focused on perceiving the world around us, but rather aim to abstract it and reason on top of that abstraction. Some of the most foundational supervised learning approaches are like that. Regression is the ability to predict a numerical value based on some available information; for example, estimating a house’s value based on its features and location, or forecasting sales based on historical data. Classification is the ability to identify the class or category of an item based on its features; for example, identifying whether or not a house is likely to be sold to a particular buyer. Optimization algorithms reason on top of a process to maximize a particular outcome, like allocating the resources in a hospital.

Recommendation systems are able to find similarities between items like movies, books, or songs not apparent to humans, just by looking at ratings or purchasing behavior. Other techniques, like clustering, can find patterns in data and group items in an unsupervised way, as mentioned earlier.

We see the same cognition capabilities in reinforcement learning techniques as well. In 2017, the Microsoft Research Lab in Montreal (previously Maluba) was able to set a new record for Ms. Pac-Man, crossing the one million point barrier for the first time. The system was trained by playing thousands of games by itself. Similarly, OpenAI Five—a team of five neural networks—began to beat human teams at Dota 2 in 2018; OpenAI Five was trained by playing 180 years’ worth of games against itself every day. The most famous example is probably the accomplishment achieved by Google DeepMind: its system, AlphaGo, was the first to be able to beat a 9-dan professional player of Go, a game that is considered much more difficult for computers than others, like chess. Closely watching games played by any of these AI systems will give you the impression that they’re exhibiting another characteristic of cognition—planning. The system is able to “think” in advance what the best approach is to maximize its score in the long term.

AI Capabilities Cheat Sheet

Since I moved to the United States 10 years ago, every year in the summer I’ve gone back to Spain with my family for a vacation. The trip from Redmond, WA, to my hometown in Cádiz takes about 24 hours, with three different flights. As any parent with three kids will know, there’s a question you will hear approximately a million times in such a journey (the first time probably just as you’re leaving your house, on the way to the airport): Are we there yet?

I have a similar feeling when it comes to AI. For the past few years, customers have been asking “Are we there yet?” In a world of AI over-hype, it’s difficult to separate reality from fiction, and true capabilities from marketing stunts.

The good news is that we are there. AI is real today, and thousands of companies are using it to transform their business. You should definitely be conscious of the future possibilities of AI, but it’s more important that you understand what AI can do today.

The following cheat sheet (Figure 1-3) can be handy for that purpose: it provides a summary of the core capabilities introduced in this chapter. All of them are real today and we will use them extensively in the rest of the book, applying them to real business scenarios.

The AI capabilities cheat sheet
Figure 1-3. The AI capabilities cheat sheet

Tables 1-1 through 1-3 provide additional information about each of these core capabilities.

Table 1-1. Perception—interpreting the world around us
AI capability Use case
Inline Extract information from or understand images and videos—for example, performing image classification, scene identification, or face recognition.
Inline Perform audio processing tasks such as sound recognition or audio pattern identification—for example, identifying machinery failures based on sound.
Inline Interact with humans using speech—for example, performing natural text-to-speech and speech-to-text conversions.
Inline Understand and generate text language—for example, identifying intent, extracting concepts, analyzing sentiment, or answering questions.
Table 1-2. Cognition—reasoning on top of data
AI capability Use case
Inline Estimate a numerical value based on other variables or their values over time—for example, predicting house values or forecasting sales.
Inline Identify a set of categories of a given instance—for example, fraud detection or medical diagnosis.
Inline Predict a user’s preference for a particular item given similarities with other items or other users’ preferences—for example, movie recommendations or experience personalization.
Inline Find the best sequential approach for a goal—for example, identifying a path for an autonomous vehicle or the steps in a business process.
Inline Maximize a given outcome by finding the right parameters in a process—for example, resource allocation or dynamic pricing.
Inline Augment the decision-making process by providing relevant insights on data—for example, clustering or key factor identification.
Table 1-3. Learning—learning without being explicitly programmed
AI capability Use case
Inline Learn by iterating over training datasets containing labeled data (pairs of inputs and outputs)—for example, using data on previous customer interactions to predict churn.
Inline Infer hidden structures in an unlabeled dataset, such as relationships, categories, patterns, or features—for example, identifying different usage patterns or user segments in a website.
Inline Learn by experimenting in an environment, trying to maximize a reward provided in the training—for example, operating a vehicle autonomously or optimizing the energy consumption in a datacenter.

The AI Organization

The addition of these new capabilities represents a profound transformation in software—and therefore business—as we know it. AI changes the way software is created, from providing step-by-step instructions to learning through data and experiences. It enables new ways for machines to interact with users and the world around them, through perception and natural interfaces, and cognitive capabilities make it possible to reason on top of the acquired information.

Think of every other evolution of software in the past three decades. The client/server paradigm born with the PC radically transformed the software experience in the enterprise. The internet, which one can argue represented a change in how software is consumed and distributed, caused a technology disruption that affected every industry. Mobile computing created entire new customer engagements and business models.

All these disruptions may pale in comparison with AI. Learning, perception, and cognition provide a whole new toolbox that will enable a new generation of software. And with every company being a software company nowadays, a new generation of software also means a new generation of companies. I like to refer to that new breed of companies as AI organizations.

Just like the transition to software companies, the transition to AI organizations involves a left-to-right rethinking of how organizations run their operations, engage their customers, empower their employees, and even define their products. Each function in the organization can be redefined with the new AI toolbox.

That makes starting difficult. With so many new tools and so many use cases to apply them, where do we begin? Business leaders often ask about the typical use cases of AI. The approach in many organizations is to identify as many use cases as possible, prioritize them with the framework of their choice, and execute a few, hoping some of them will achieve promising results.

Instead, I prefer to think of the AI transformation as a journey. Each tactical use case that I execute is a step that’s part of a broader strategy. The culmination of that strategy is to redefine the entire organization with AI.

You can visualize the stages in this journey as a set of concentric circles, as shown in Figure 1-4. At the core are the technical departments in your organization, where the transformation usually begins. This may be your IT department, or your Development department, or a distributed function in your organization that has been on point for your software transformation over the past years. In the next ring are your business units. These are the different areas in your organization that represent business functions beyond technology: both horizontal business units (sales, marketing, finance, HR) and vertical business units specific to your industry. The last ring contains the employees in the organization—those supporting the business (back office employees) and those directly involved with the production and customer engagement (front-line employees).

The journey of an AI organization
Figure 1-4. The journey of an AI organization

These rings are interconnected. You cannot transform your business units without a transformation in your technical departments. You cannot expect every employee to use AI if the business units have not embraced it. In a sense, these concentric rings are similar to a platform stack: instead of isolated use cases, the AI organization has a comprehensive approach to AI in which the layers build on top of one another.

In the next three chapters, we cover what it means to transform each of these rings. You will see how to apply the learning, perception, and cognition capabilities discussed here to each of the rings, identifying the use cases that will help you get started. You will also learn how to evaluate different use cases, not only in terms of their direct results but also their overall impact and contribution to the AI organization journey.

At the end of this journey, AI should be a primary component of your organization and not just an ingredient for isolated use cases:

  • AI should be a first-class citizen in your technical departments. In Chapter 2, you will see how to bring AI into all your software applications and create new ones that are only possible with AI.

  • Business units should partner with the technical departments to jointly redefine every business process with AI. In Chapter 3, you will learn how to identify and prioritize these business processes, and you will explore many of the use cases for your industry.

  • Finally, every employee should be part of the AI transformation. In Chapter 4, you will see how to ensure that the technical departments and business units in your organization provide the right platform to effectively empower employees to apply AI to everything they do.

Get The AI Organization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.