Chapter 1. The Advent of the Smart Data Era

The data we collect has experienced exponential growth, whether we get it through our PCs, mobile devices, or the IoT, or from tools for ecommerce or social networking. According to the IDC Report, global data volume reached 8 ZB (or 8 billion TB) in 2015 and is expected to reach 35 ZB in 2020, with an annual increase of nearly 40%. And according to TalkingData, in 2016 China was home to 1.3 billion smartphone users, accounting for tens of millions of wearable devices such as smart watches and over 8 billion sensors of different kinds. Smart devices can be seen nearly everywhere and generate data of various dimensions—anytime, anywhere.

Data accumulation has created favorable conditions for the development of artificial intelligence (AI). The training of machines with a huge amount of data may generate more powerful AI. For example, the game of Go (or “Weiqi” in Chinese) has been traditionally viewed as one of the most challenging games due to its complicated tactics. In 2016, Google’s program AlphaGo (with access to 30 million distributed data points and improved algorithms, accumulated by users after they played Go hundreds of thousands of times) defeated world Go champion Li Shishi, proving its No.1 Go-playing ability. In the previous two years, AI also witnessed explosive growth and application in the fields of finance, transport, medicine, education, industry, and more. It’s clear that the data accumulated by mankind has been used to produce new intelligence, which could aid our work, reduce costs, and improve efficiency. According to a CB Insights report, investment funds of global AI startups also had exponential growth during 2010 to 2015.

isdp 0101
Figure 1-1. Artificial intelligence global yearly financing history, 2010–2015, in millions of dollars (source: CB Insights)

Data accumulation and the development of AI promote and complement each other. Andrew NG, AI expert and VP & Chief Scientist of Baidu, said in a Wired article, “To draw an analogy, data is like the fuel for a rocket. We need both a big engine (algorithm) and plenty of fuel (data) in order to enable the rocket (AI) to be launched.” Also, AI has brought us more application contexts such as chatting robots and autonomous vehicles, which are generating new data.

And now data is becoming not only bigger but also smarter and more useful. We have entered the smart data era.

Three Elements of the Smart Data Era: Data, AI, and Human Wisdom

Data accumulation can enable deeper insights and help us to gain more experience and wisdom. For example, through further analysis on mobile phone users’ behaviors, enterprises can gain more understanding of their clients, including their preferences and consuming habits, so as to gain more marketing opportunities. Additionally, AI in itself requires the involvement of human wisdom so as to guide the orientation of AI and increase its efficiency. For example, AlphaGo needs to fight against professionals in the game of Go so as to continuously enhance its Go-playing ability with the aid of human wisdom.

Without the continuous intervention of human wisdom, the addition of AI to data will lose some of its value and even become ineffective. Conversely, without AI, it is a challenge for humans alone to deal with such complicated and rapidly changed data. Also, without data, it would be impossible for AI to exist and the accumulation of human wisdom would also slow down. Data, AI, and human wisdom facilitate each other and form a forward loop.

For example, in the field of context awareness, the movements and gestures of mobile phone users (including walking, riding, driving, etc.) may be judged by using AI algorithms with the phones’ sensor data. If any judgment is not accurate enough, data should be sorted and enhanced by human intervention and algorithms should be optimized until the result is acceptable. Also, mobile phones capable of context awareness may provide application developers more contexts and experience, such as body-building (i.e., gestures need to be captured and the frequency/number of steps or even the place needs to be judged in order to obtain more accurate data of users’ status), financial risk control, logistics management, and entertainment. Accordingly, more data would be generated. This new data may allow human wisdom to grow quickly and AI to become more powerful. For example, it is discovered through context-awareness data that most users keep their mobile phones in their hands when they are using apps. Thus, does a non-handheld application context—such as fraudulent app rating, done on non-handheld mobile phones—mean even greater financial risk?

The three elements of the smart data era have generated incredible value in their combined and independent actions. Enterprises that adapt to the new era would be able to restructure their infrastructure using data, AI, and human wisdom and accelerate the process of exploring and realizing commercial value so as to stand out in fierce competition. Those enterprises with slow actions would be at a loss when they are faced with scattered and complicated data and gradually lose their competitiveness. There is no way for them to share the greatest benefit (i.e., value). Nevertheless, the shock of a new era is independent of enterprise scale or industry.

In this report, we are going to list the challenges for enterprises during the smart data age and analyze their causes. With over five years of industrial service experience, TalkingData has helped enterprises find solutions to cope with the challenges of data, and to efficiently explore the business value of data. We introduce the concept of SmartDP along with the three basic capabilities that SmartDP should possess: data management, data science, and data engineering. Meanwhile, we also introduce the SmartDP referential framework, and detail the functions of each layer. Finally, we will take a look at how SmartDP is adopted in real scenarios to enhance our understanding of smart data.

Get Implementing a Smart Data Platform now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.