Chapter 1. The Evolution of Data Analytics

Data processing has evolved continuously and considerably since its origins in mainframe computers. Figure 1-1 shows four distinct stages in the evolution of data analytics since 1990.

In the 1990s, Data Warehouse and relational database management system (RDBMS) technologies enabled organizations to store and analyze data on servers cost-effectively with satisfactory performance. Storage area networks (SANs) and network-attached storage (NAS) were common in these applications. But as data volumes continued to grow, the performance of this architecture became too expensive to scale.

Circa 2005, the distributed server cluster that utilized direct-attached storage (DAS) for better I/O performance offered a more affordable way to scale data analytics applications. Hadoop and MapReduce, which were specifically designed to take advantage of the parallel processing power available in clusters of servers, became increasingly popular. Although this architecture continues to be cost-effective for batch-oriented data analytics applications, it lacks the performance needed to process data streams in real time.

By 2010, the in-memory database became affordable owing to the ability to configure servers with terabytes of low-cost random-access memory (RAM). Given the dramatic increase in read/write access to RAM (100 nanoseconds versus 10 milliseconds for DAS), the improvement in performance was dramatic. But as with virtually all advances in performance, the bottleneck shifted—this time from I/O to compute for a growing number of applications.

This performance bottleneck has been overcome with the recent advent of GPU-accelerated compute. As is explained in Chapter 2, GPUs provide massively parallel processing power that we can scale both up and out to achieve unprecedented levels of performance and major improvements in price and performance in most database and data analytics applications.

Today’s Data Analytics Challenges

Performance issues are affecting business users:

In-memory database query response times degrade significantly with high cardinality datasets
Systems struggle to ingest and query simultaneously, making it difficult to deliver acceptable response times with live streaming data

Price/performance gains are difficult to achieve.

Commercial RDBMS solutions fail to scale-out cost effectively
x86-based compute can become cost-prohibitive as data volumes and velocities explode

Solution complexity remains an impediment to new applications.

Frequent changes are often needed to data integration, data models/schemas, and hardware/software optimizations to achieve satisfactory performance
Hiring and retaining staff with all of the necessary skillsets is increasingly difficult—and costly

Get Introduction to GPUs for Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Introduction to GPUs for Data Analytics by Roger Biery, Eric Mizell

Chapter 1. The Evolution of Data Analytics

Figure 1-1. Just as CPUs evolved to deliver constant improvements in price/performance under Moore’s Law, so too have data analytics architectures

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly