17.1 Introduction

In Section 1.13, we introduced big data. In this capstone chapter, we discuss popular hardware and software infrastructure for working with big data, and we develop complete applications on several desktop and cloud-based big-data platforms.

Databases

Databases are critical big-data infrastructure for storing and manipulating the massive amounts of data we’re creating. They’re also critical for securely and confidentially maintaining that data, especially in the context of ever-stricter privacy laws such as HIPAA (Health Insurance Portability and Accountability Act) in the United States and GDPR (General Data Protection Regulation) for the European Union.

First, we’ll present relational databases, which store structured data ...

Get Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.