Learning on massive click logs with Spark

Normally, in order to take advantage of Spark, data is stored in a Hadoop Distributed File System (HDFS), which is a distributed filesystem designed to store large volumes of data, and computation occurs over multiple nodes on clusters. For demonstration purposes, we are keeping the data on a local machine and running Spark locally. It is no different from running it on a distributed computing cluster.

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.