© Robert Ilijason  2020
R. IlijasonBeginning Apache Spark Using Azure Databrickshttps://doi.org/10.1007/978-1-4842-5781-4_5

5. Getting Data into Databricks

Robert Ilijason1 
(1)
Viken, Sweden
 

All the processing power in the world is of no use unless you have data to work with. In this chapter, we’ll look at different techniques to get your data into Databricks. We’ll also take a closer look at file types that you are likely to come across in your data work.

To get a better understanding of how data is stored in Databricks, we’ll investigate their own file system, called Databricks File System or DBFS for short. With this knowledge, we’ll look at how we can pull data from the Web, from files, and from data lakes.

Getting data is easier if you have continuous ...

Get Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.