Working with CSV data

CSV files can be treated as plain text files, having a comma as a delimiter and will generally work as expected. Consider, we have data of movies in the following format:

movieId, title, genre

Let's load this file as an RDD of the Movie object, as follows:

Movie POJO:

public class Movie implements Serializable {
private Integer movieId;
private String title;
private String genre;
public Movie() {}; public Movie(Integer movieId, String title, String genere ) { super(); this.movieId = movieId; this.title = title; this.genre = genere; } public Integer getMovieId() { return movieId; } public void setMovieId(Integer movieId ) { this.movieId = movieId ; } public String getTitle() { return title; } public void setTitle(String ...

Get Apache Spark 2.x for Java Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.