In this chapter, we will be having a look at the DataFrame API, which is the core API that we will use with .NET for Apache Spark. Apache Spark has a couple of different APIs, the Resilient Distributed Dataset (RDD) and DataFrame APIs, for processing. We will cover what the APIs are and why the RDD API is not available in .NET and that it is fine; the DataFrame API gives us everything we need.
The RDD API vs. the DataFrame API
The Resilient Distributed Dataset (RDD) API provides access to RDDs. RDDs are an abstraction over what could be massive data files by partitioning the files and spreading the ...