- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- We will use a JSON data file named cars.json which has been created for this example:
{"make": "Telsa", "model": "Model S", "price": 71000.00, "style": "sedan", "kind": "electric"}{"make": "Audi", "model": "A3 E-Tron", "price": 37900.00, "style": "luxury", "kind": "hybrid"}{"make": "BMW", "model": "330e", "price": 43700.00, "style": "sedan", "kind": "hybrid"}
- Set up the package location where the program will reside
package spark.ml.cookbook.chapter3
- Import the necessary packages for the Spark session to gain access to the cluster and Log4j.Logger to reduce the amount of output produced by Spark.
import org.apache.log4j.{Level, ...