Next, we will create a summary of the dataframe. However, instead of just printing the summary as we have done in the past, we will save the results of the summary to an object as well, since we will be using the aggregate information later in order to join it back to the original data. Again, if you will be manipulating large datasets, there is no need to run summary() functions more than once on the same data:
- Set up some global options, such as thue number of significant digits, plot width and height, etc.
- Assign the summary output to an dataframe.
- Use the databricks display() function (or head() function) to print a portion of the file:
options(digits=3) options(repr.plot.width ...