book

Python: Real World Machine Learning

by Prateek Joshi, John Hearty, Bastiaan Sjardin, Luca Massaron, Alberto Boschetti

November 2016

Beginner to intermediate

941 pages

21h 55m

English

Packt Publishing

Read now

Unlock full access

Content preview from Python: Real World Machine Learning

Data preprocessing in Spark

So far, we've seen how to load text data from the local filesystem and HDFS. Text files can contain either unstructured data (like a text document) or structured data (like a CSV file). As for semi-structured data, just like files containing JSON objects, Spark has special routines able to transform a file into a DataFrame, similar to the DataFrame in R and Python pandas. DataFrames are very similar to RDBMS tables, where a schema is set.

JSON files and Spark DataFrames

In order to import JSON-compliant files, we should first create a SQL context, creating a SQLContext object from the local Spark Context:

In:from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

Now, let's see the content of a small JSON file (it's ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Interpretable Machine Learning with Python

Publisher Resources

ISBN: 9781787123212Supplemental Content Purchase Link

Python: Real World Machine Learning

by Prateek Joshi, John Hearty, Bastiaan Sjardin, Luca Massaron, Alberto Boschetti

Data preprocessing in Spark

JSON files and Spark DataFrames

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Interpretable Machine Learning with Python

Large Scale Machine Learning with Python

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Python Machine Learning Cookbook - Second Edition

Publisher Resources