© Thomas Mailund 2017

Thomas Mailund, Beginning Data Science in R, 10.1007/978-1-4842-2671-1_5

5. Working with Large Datasets

Thomas Mailund

(1)Aarhus, Denmark

The concept of Big Data refers to very large datasets , sets of sizes where you need data warehouses to store the data, where you typically need sophisticated algorithms to handle the data, and distributed computations to get anywhere with it. At the very least, we talk many gigabytes of data but also are often dealing with terabytes or exabytes.

Dealing with Big Data is also part of data science, but it is beyond the scope of this book. This chapter is on large datasets and how to deal with data that slows down your analysis, but it is not about datasets so large that you cannot analyze them ...

Get Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.