July 2018
Intermediate to advanced
474 pages
13h 37m
English
While we did do our joins using functions within a Spark dataframe using PySpark, we could have also done it by registering the dataframes as temporary tables and then joining them using sqlContext.sql():
movies.createOrReplaceTempView('movies_')links.createOrReplaceTempView('links_')ratings.createOrReplaceTempView('ratings_')
mainDF_SQL = \sqlContext.sql(""" select r.userId_1 ,r.movieId_1 ,r.rating_1 ,m.title ,m.genres ,l.imdbId ,l.tmdbId ...Read now
Unlock full access