Apache Spark is a data analytics platform that has made big data accessible and brings large-scale data processing into the reach of every developer. With Apache Spark, it is as easy to read from a single CSV file on your local machine as it is to read from a million CSV files in a data lake.
An Example
Let us look at an example. The code in Listings 1-1 (C#) and 1-2 (the F# version) reads from a set of CSV files and counts how many records match a specific condition. The code reads all CSV files in a specific path, so the number of files we read from is practically limitless.
Although the ...