To illustrate this, let's first take a 50% sample of the stop and frisk dataframe. We also want to make sure that the amount of data we extract can be processed easily by base R, which has a memory limitation that is dependent upon the CPU size.
- The code below will first extract a 50% sample from Spark and store it in a local R dataframe named dflocal.
- Then it will run an str() command to verify the rowcount and the metadata:
dflocal = collect(sample(df, F,.50,123)) str(dflocal)