June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now that we have some aggregate counts, we have some idea of what we will be looking at. So we will read the entire file into a Spark dataframe via SQL and print the first record (you can print more if you like).
Just by looking at the first record(s), you can see that there are a mix of binary yes/no flags and some numeric variables:
df <- sql("SELECT * FROM stopfrisk") head(df,1)
