From Insight-as-a-Service to insightful applications

Applications that combine machine learning, AI, and domain knowledge have strong potential for industry and investors.

By Evangelos Simoudis
April 26, 2016
ABC Museo Madrid. ABC Museo Madrid. (source: Paul Thompson on Flickr)

We started this series with the premise that insightful applications, essentially the next-generation of big data applications, are the key to effectively addressing important problems, such as autonomous driving. In the first post, we examined how data analytics has evolved over the past 25 years. In the second post, we established how the early success with big data infrastructures has given rise to applications, and we defined a taxonomy of such applications. In this third post, we will discuss insightful applications in more detail and conclude by outlining the future for such applications, along with the sector’s investment potential.

Insight generation leads to insightful applications

A discussion on insightful applications must begin by first defining what is an insight. An insight is a novel, interesting, plausible, and understandable relation, or set of associated relations, that is selected from a larger set of relations derived from a data set. The selected relation, or relations, must lead to the formation of an action plan. This action plan, when applied, results in a change that can be measured through a set of Key Performance Indicators (KPIs). The ability to measure KPIs enables such systems to learn and improve over time. Insights and their associated action plans are generated by complex systems, which were described in an earlier post.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Insightful applications combine big data management with an insight generation system, which incorporates a variety of machine intelligence techniques, (e.g., machine learning, planning and reasoning, natural language processing, vision, and other AI techniques) and appropriate domain knowledge for those techniques to operate effectively. These applications must not only be able to generate an action plan from insights, but they must be able to approximate the necessary resources (e.g., time, money) that will be consumed in the process of executing that plan.

A good example of an insightful application is found in autonomous cars, such as Google’s self-driving car. In order for an autonomous car to function properly, its software must be able to deal with very large volumes of diverse data that are generated in real time from sensors like lidar; make a variety of predictions using numerous sophisticated models; select the most interesting and plausible of these predictions (e.g., within 50 yards, a bicyclist riding ahead of the car will likely enter the car’s lane); and generate an action plan based on these insights (e.g., slow the car by applying the brakes) while constantly evaluating the results of these actions (e.g., make sure the car does not hit the bicyclist). Based on the data that is collected from this incident and the evaluation of the plan that was executed, there is an opportunity for the insightful application to learn.

Insight-as-a-Service: A precursor to insightful applications

Because of their complexity, insight generation systems—and therefore insightful applications—have been hard and expensive to develop, and few insightful applications have been introduced to date. At best, corporations have been utilizing what we have termed “Insight-as-a-Service.” As shown in Figure 1, insights and their associated action plans are generated periodically as part of a service offered by specialized personnel, called “connectors,” in collaboration with data scientists.

Insight-as-a-Service process
Figure 1. Insight-as-a-Service process. Figure courtesy of Evangelos Simoudis.

At a high level, a connector works with the business user to understand the problem to which insights and actions will be applied. In addition to defining the business problem, the connector gains an understanding of the data that is available and can be used to address the problem. The business problem definition combined with the data description is then “translated” to a data problem, which can be addressed by data scientists. Data scientists, with the help of data stewards, work on the available data to extract patterns and other relations. The connector evaluates the extracted patterns and relations to identify those that constitute insights. The connector then proceeds to associate one or more appropriate action plans with each identified insight, and presents the resulting pairs to the business user. After applying the prescribed actions, the business user communicates to the connector the results of these actions, along with the effectiveness of the actions to address the problem being solved (as determined through the use of a previously defined set of KPIs).

Because of their role in providing Insight-as-a-Service, connectors must have strong knowledge of the processes, technologies, and issues in the industries they work. They must be able to “translate” business problems into data problems, and effectively communicate this translation to the data scientists. Lastly, connectors must be able to fuse the relations extracted by the data scientists and their own domain knowledge to generate insights and action plans, which will be communicated to business users.

Today, Insight-as-a-Service is offered primarily by management consulting firms like IBM, PwC, McKinsey, and Accenture. Because of the skills required and the critical role they play in the insight generation process, connectors tend to be some of the most expensive resources a management consulting firm employs. There are too few such individuals in management consulting firms today. This means that too few corporations are generating insights from their data, focusing instead on areas with potential for a high return on investment (ROI) such as cybersecurity, supply chain optimization, and marketing budget optimization.

An abbreviated example of Insight-as-a-Service

Consider the process a cable company might go through in order to optimize its marketing budget as it tries to determine what percent of its budget to allocate toward reducing customer churn. The marketing department must create a set of actions that are within the limits of the marketing budget. Using his specialized knowledge, the connector must understand the nature of the churn problem and the data that is available to address this problem. He may determine that, as a first step, the company’s subscribers should be scored in terms of their probability to abandon their service, and maybe even their probability to upgrade to a higher level of service. He must then present this as a scoring problem to the data scientist, along with the available data. The data scientist can then proceed to develop the appropriate scoring model(s).

Once the models are developed and the scores have been established, the connector uses his domain knowledge to do the following:

  1. Organize the scored customers into segments.
  2. Select the segments of at-risk customers that are worth paying attention to (this is the insight). For example, at-risk customers may be the segment of heavy cable Internet users that spend 10-20% of their time watching cable programming.
  3. Establishes the actions to take and the percentage of the marketing budget to allocate to each selected segment. For example, offer the at-risk segment a free premium channel upgrade for a period of six months. The cost of this action to the cable company may be $100/customer, but the Lifetime Value (LTV) of each salvaged customer may be $1,000, and the expected target response to the offer may be 30%.

There is increasing interest in developing, and funding, insightful applications. We’re seeing this increased interest because we have a greater understanding about what is involved in providing Insight-as-a-Service and the strong ROI that such solutions already provide. We’re also aware of the early successes with other types of big data applications, as described in the second post of this series, and the ever-increasing availability of big data. Finally, we acknowledge recent advances in the technologies that are used by insight generation systems and, more importantly, the strong demand for more automated, accurate, and faster decision-making.

Areas where we are seeing great opportunity for insightful applications, as well as entrants starting to offer such applications, include:

  1. Cybersecurity in the area of threat intelligence.
  2. Online marketing in the area of multichannel programmatic advertising.
  3. Automotive in the area of autonomous driving.
  4. Health care in the area of personalized medicine.
  5. Financial services in the area of wealth advisory services.
  6. Manufacturing and logistics (across several industries such as retail and consumer packaged goods) in the area of dark warehouses and fully robotic assembly lines. For example, Foxconn is adding 30,000 robots per year in its assembly lines.
  7. Customer experience (across several industries such as retail, financial services, travel, and health care) in the area of automated customer support.
  8. Various industries such as manufacturing, agriculture, and oil exploration, as well as production in areas where IoT data can be used for applications like machine diagnosis, crop yield optimization, oil field optimization, and others.

These areas share the following common characteristics, which make them particularly relevant as insightful applications:

  1. Not enough professionals to provide remedies to problems (e.g., security).
  2. Need for quick response in the execution of actions (e.g., autonomous cars, security, online marketing).
  3. Big data at a scale that makes it impractical or even impossible to sift through for the right insights, but where more data improves the opportunity to identify better insights (e.g., health care, security, marketing).
  4. Need to provide higher quality service that is consistent across all channels at a lower cost (e.g., financial services, manufacturing, logistics).

In general, my thesis is that through insightful applications and a fixed amount of accuracy, we can use lower-cost resources, including less computation, because of the availability of more data.

An interesting early example of an insightful application is IBM’s Watson Oncology Assistant, which IBM is developing in collaboration with various medical centers such as the MD Anderson Cancer Center and the Memorial Sloan Kettering Cancer Center. This insightful application optimizes the decision-making process that determines which patient therapy to recommend based on a) patient genetic data that is generated from DNA sequencing, b) MRI data, c) other documents describing the patient’s health history, and d) scientific publications. For example, while DNA sequencing enables us to identify mutations linked to cancers, the data that is generated in the process is voluminous. Many physicians cannot easily make use of all available data because of its volume and variety, and often because they may not be aware of publications describing relevant research results. The Watson Oncology Assistant will be able to address all these problems and make the appropriate therapy recommendations to the attending physician.

Insightful applications and artificial intelligence

To achieve their objectives, insightful applications combine big data management with artificial intelligence concepts and systems. In particular, insightful applications include:

  1. Flexible data management systems that store the big data on which they operate and learn from.
  2. Rich knowledge representation capabilities in order to encode domain knowledge as well as learned knowledge.
  3. Reasoning capabilities through which they reason over the represented knowledge.
  4. Machine learning, including deep learning, systems to automatically extract patterns and relations.
  5. Planning systems that can synthesize sets of related actions to generate an outcome that is associated with an insight.

The complexity and cost of developing insightful applications has been decreasing significantly and this trend is expected to continue for the foreseeable future. This is because artificial intelligence systems having become more readily available and better understood, particularly with the release of open source packages from Google, Facebook, Microsoft, and other companies. Additionally, cheaper storage, abundant and cheap processing power and networking bandwidth, and cloud-enabled separation of storage and computing have helped drive the development of insightful applications.

Despite our improving understanding of insight generation, the technology advances we have made, and the growing number of insightful applications currently under development, we are not completely out of the woods. There are four major areas in particular where we need to make progress:

  1. Domain knowledge acquisition, representation, enhancement, and maintenance—the so-called “ontology development”—remains a big issue, particularly from data sources such as image, video, and also spoken language. For example, IBM is facing this issue as it continues to develop the Watson Oncology Assistant. It has tried to address it in a variety of ways, including the acquisition of other companies. Corporations such as Google, Apple, Facebook, Amazon, Microsoft, and others are aggressively acquiring startups with the right know-how and IP in this area.
  2. Sensor technology (e.g., size, energy usage, local processing, amount and type of sensing each sensor can accomplish) is important for data acquisition in many of areas, such as autonomous driving and health care. For example, see the recent acquisition of Cruise Technologies by GM, which was driven by the development of lower cost sensors and corresponding algorithms that enable the conversion of existing cars into self-driving vehicles.
  3. Extracting valuable relations and associated actions in hyper-dimensional domains (e.g., areas where each event may be characterized by millions of features, such as cancer or the many situations an autonomous vehicle has to address) remains difficult. Deep learning approaches may prove particularly useful in this area, but we are still in the very early stages of applying these approaches to large, real-world problems.
  4. Self-learning systems that are able to automatically improve their performance based on previously established KPIs without relying on data scientists are still in their infancy.

Venture investment in insightful applications

For the reasons we have previously described, while insightful applications present an exciting investment opportunity, they are complex and difficult to develop. As a result, we don’t foresee the development of such applications leading to the complete elimination of connectors and data scientists.

In the second post of this series, we noted that over the past few years, venture investors have been investing in three types of big data applications: shallow applications that use general-purpose analytic tools, applications that process big data but that do not use predictive or prescriptive analytics, and applications that use embedded predictive analytics. More recently, as they have recognized the importance and economic value of successful insightful applications, a few venture investors are starting to invest in startups that develop insightful applications. Because I have come to recognize how critical it will be for corporations to utilize big data through insightful applications in their effort to innovate, particularly using my startup-driven innovation methodology, I am focusing my new venture fund on startups that develop such applications.

We anticipate that insightful applications will be developed over a few different generations with the ultimate goal of the application completing 70% of the process and humans—including data scientists, connectors, and business users—completing the remaining 30%. Today’s first generation insightful applications are able to assist connectors and data scientists. The next generation’s applications will be better able to understand situations automatically. IBM, for example, is in the process of equipping Watson with sophisticated natural language processing technologies that can automatically understand and encode domain knowledge from a variety of sources, such as journal articles as well as spoken problem descriptions. The generation after that will be able to make decisions more autonomously, matching libraries of insights and action plans to descriptions of new problems. The final generation of insightful applications will be able to discover new insights and action plans on their own, with limited guidance from expert users.


Insightful applications are the key to effectively providing big data-driven solutions to many important problems while simultaneously controlling the costs of such solutions and dealing with the shortage of the necessary specialized personnel. Because of their complexity, the development of such applications will be neither simple nor quick. Patience will be necessary, as we anticipate that the promise of insightful applications will be realized in several generations of increasingly sophisticated and increasingly automated applications. Recognizing the opportunity afforded by such applications, a few corporations and venture investors have started aggressively investing in their development—the initial results are already impressive and fill us with excitement about what will be possible in the near future.

Related resource:

Post topics: Data science