Spark for Big Data

The amount of data stored in the world is increasing in a quasi-exponential fashion. Nowadays, for a data scientist, having to process a few terabytes of data a day is not an unusual request anymore and, to make things even more complex, this implies having to deal with data that comes from many different heterogeneous systems. In addition, in spite of the size of the data you have to deal with, the expectation of business is constantly to produce a model within a short time, as you were simply operating on a toy dataset.

In conclusion of our journey around the essentials of data science, we cannot elude such a key necessity in data science. Therefore, we are going to introduce you to a new way of processing large amounts ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.