© Ed Elliott 2021
E. ElliottIntroducing .NET for Apache Sparkhttps://doi.org/10.1007/978-1-4842-6992-3_1

1. Understanding Apache Spark

Ed Elliott1  
(1)
Sussex, UK
 

Apache Spark is a data analytics platform that has made big data accessible and brings large-scale data processing into the reach of every developer. With Apache Spark, it is as easy to read from a single CSV file on your local machine as it is to read from a million CSV files in a data lake.

An Example

Let us look at an example. The code in Listings 1-1 (C#) and 1-2 (the F# version) reads from a set of CSV files and counts how many records match a specific condition. The code reads all CSV files in a specific path, so the number of files we read from is practically limitless.

Although the ...

Get Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.