Chapter 2. Digital Transformation

The amount of data being created is simply exploding. As we move deeper into the era of digital transformation, there are many more sources of data of many different types. The majority of the data—more than 90 percent—is unstructured (e.g, email, text documents, images, audio, and video). These data types are a poor fit for relational databases and traditional storage methods. There is tremendous value in the data, but taking advantage of all of it for business insight is challenging at both the technical and semantic level.

Technical Challenges

From a technical perspective, to make use of the data, we must collect, transfer, store, and process it. The sheer volume of data presents significant challenges. Unprocessed data is useless, so compute requirements are growing in direct response to the data growth. Thus, more powerful servers (or, as it turns out, more servers working in parallel) are needed.

New data management software frameworks have been developed (Hadoop, NoSQL) that can process massive amounts of data in parallel across a large cluster of servers. These frameworks have storage services built in; for example, replication, self-healing, rebalancing, and scaling out by adding servers. Traditional systems have to rely on underlying storage arrays to provide these services. Tighter integration of storage services with the data management layer provides a lot of flexibility. For example, you can increase or decrease the number or replicas, ...

Get Transforming Industry Through Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.