July 2017
Intermediate to advanced
796 pages
18h 55m
English
SparkDataFrame is a distributed collection of rows under named columns. Less technically, it can be considered as a table in a relational database with column headers. Furthermore, PySpark DataFrame is similar to Python pandas. However, it also shares some mutual characteristics with RDD:
Just like Java/Scala's DataFrames, PySpark DataFrames are designed for ...
Read now
Unlock full access