Data Sources API

The Data Sources API provides a single interface for loading and storing data using Spark SQL. In addition to the built-in sources, this API provides an easy way for developers to add support for custom data sources. All available external packages are listed at http://spark-packages.org/. Let's learn how to use built-in sources and external sources in this section.

Read and write functions

The Data Sources API provides generic read and write functions that can used for any kind of data source. Generic read and write functions provide two functionalities as given in the following:

  • Parses text records, JSON records, and other formats and deserializes data stored in binary
  • Converts Java objects to rows of Avro, JSON, Parquet, and HBase ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.