Manipulating DataFrames
In the previous recipe, we saw how to create a DataFrame. The next natural step, after creating DataFrames, is to play with the data inside them. Other than the numerous functions that help us to do that, we also find other interesting functions that help us sample the data, print the schema of the data, and so on. We'll take a look at them one by one in this recipe.
Note
The code and the sample file for this recipe could be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/src/main/scala/com/packt/scaladata/spark/csv/DataFrameCSV.scala.
How to do it...
Now, let's see how we can manipulate DataFrames using the following subrecipes:
- Printing the schema of the DataFrame
- Sampling data in ...
Get Scala: Guide for Data Science Professionals now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.