Fifty Years of Data Management and Beyond

by Paco Nathan

Released April 2019

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781492057505

Start your free trial

Book description

Every decade since the 1960s, researchers at companies like IBM, Amazon, and many others have introduced major new frameworks and techniques to handle rising data management problems. This concise ebook explains how these new systems helped data science evolve quickly—from hierarchical and relational databases to big data and cloud computing to streaming and graph data.

Computer scientist Paco Nathan shows members of your data science team how major companies created each of these data management systems not just to deal with new data types but also to take full advantage of the opportunities the data presented. Their efforts over the years have propelled an entire industry.

This report covers the historical progression of data management topics including:

Hierarchical databases—1960s mainframe batch systems are still used in finance, healthcare, manufacturing, energy, and other industries.
Relational databases—these enabled faster transactions, mathematical optimization, and budgeting guarantees for many businesses.
Big data—this includes relatively cheap horizontal scale-out systems for collecting huge amounts of customer data.
Cloud computing—large companies began managing reliable, scalable, cost-effective data centers; Amazon turned the concept into a business.
Cluster schedulers—managing horizontal clusters was difficult before schedulers such as Apache Mesos appeared.
Streaming data—data continuously generated by different sources requires responses in "real time"—generally milliseconds.