Chapter 1. The Evolution of Data Analytics
Data processing has evolved continuously and considerably since its origins in mainframe computers. Figure 1-1 shows four distinct stages in the evolution of data analytics since 1990.
In the 1990s, Data Warehouse and relational database management system (RDBMS) technologies enabled organizations to store and analyze data on servers cost-effectively with satisfactory performance. Storage area networks (SANs) and network-attached storage (NAS) were common in these applications. But as data volumes continued to grow, the performance of this architecture became too expensive to scale.
Circa 2005, the distributed server cluster that utilized direct-attached storage (DAS) for better I/O performance offered a more affordable way to scale data analytics applications. Hadoop and MapReduce, which were specifically designed to take advantage of the parallel processing power available in clusters of servers, became increasingly popular. Although this architecture continues to be cost-effective for batch-oriented data analytics applications, it lacks the performance needed to process data streams in real time.
By 2010, the in-memory database became affordable owing to the ability to configure servers with terabytes of low-cost random-access memory (RAM). Given the dramatic increase in read/write access to RAM (100 nanoseconds versus 10 milliseconds for DAS), the improvement in performance was dramatic. But as with virtually all advances in performance, the bottleneck shifted—this time from I/O to compute for a growing number of applications.
This performance bottleneck has been overcome with the recent advent of GPU-accelerated compute. As is explained in Chapter 2, GPUs provide massively parallel processing power that we can scale both up and out to achieve unprecedented levels of performance and major improvements in price and performance in most database and data analytics applications.