Chapter 7. Big Data
7.0 Introduction
Data is sometimes referred to as “the new gold.” Many companies are leveraging data in new and exciting ways every day as available data science tools continue to improve. You can now mine troves of historical data quickly for insights and patterns by using modern analytics tools. You might not yet know the queries and analysis you need to run against the data, but tomorrow you might be faced with a challenge that could be supported by historical data analysis using new and emerging techniques. With the advent of cheaper data storage, many organizations and individuals opt to keep data rather than discard it so that they can run historical analysis to gain business insights, discover trends, train AI/ML models, and be ready to implement future technologies that can use the data.
In addition to the amount of data you might collect over time, you are also collecting a wider variety of data types and structures at an increasingly faster velocity. Imagine that you might deploy IoT devices to collect sensor data, and as you continue to deploy these over time, you need a way to capture and store the data in a scalable way. This can be structured, semistructured, and unstructured data with schemas that might be difficult to predict as new data sources are ingested. You need tools to be able to transform and analyze your diverse data.
An informative and succinct AWS re:Invent 2020 presentation by Francis Jayakumar, “An Introduction to Data Lakes and ...
Get AWS Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.