© Ed Elliott 2021
E. ElliottIntroducing .NET for Apache Sparkhttps://doi.org/10.1007/978-1-4842-6992-3_5

5. The DataFrame API

Ed Elliott1  
(1)
Sussex, UK
 

In this chapter, we will be having a look at the DataFrame API, which is the core API that we will use with .NET for Apache Spark. Apache Spark has a couple of different APIs, the Resilient Distributed Dataset (RDD) and DataFrame APIs, for processing. We will cover what the APIs are and why the RDD API is not available in .NET and that it is fine; the DataFrame API gives us everything we need.

The RDD API vs. the DataFrame API

The Resilient Distributed Dataset (RDD) API provides access to RDDs. RDDs are an abstraction over what could be massive data files by partitioning the files and spreading the ...

Get Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.