March 2019
Beginner to intermediate
182 pages
4h 6m
English
We can follow these steps to download the dataset and load it in PySpark:

You can see that there's kddcup.data.gz, and there is also 10% of that data available in kddcup.data_10_percent.gz. We will be working with food datasets. To work with the food datasets, right-click on kddcup.data.gz, select Copy link address, and then go back to the PySpark console and import the data.
Let's take a look at how this works using the following steps:
Read now
Unlock full access