September 2016
Beginner to intermediate
344 pages
7h 44m
English
Spark DataFrame creation is similar to RDD creation. To get access to the DataFrame API, you need SQLContext or HiveContext as an entry point. In this section, we are going to demonstrate how to create DataFrames from various data sources, starting from basic code examples with in-memory collections:

The following code creates an RDD from a list of colors followed by a collection of tuples containing the color name and its length. It creates a DataFrame using the toDF method to convert the RDD into a DataFrame. The toDF method takes a list of column labels as an optional argument:
Python:
//Create ...
Read now
Unlock full access