July 2018
Intermediate to advanced
474 pages
13h 37m
English
This section will walk through the following steps for joining dataframes in PySpark:
for i in ratings.columns: ratings = ratings.withColumnRenamed(i, i+'_1')
temp1 = ratings.join(movies, ratings.movieId_1 == movies.movieId, how = 'inner')
temp2 = temp1.join(links, temp1.movieId_1 == links.movieId, how = 'inner')
Read now
Unlock full access