Chapter 2. Getting Data into Azure
In this chapter, we focus on the approaches for transferring data from the data source to Azure. We separate out the discussion into approaches that transfer typically large quantities of data in a single effort (bulk data loading) versus approaches that transfer individual data (stream loading), and investigate the protocols and tools relevant to each.
Using our Azure analytics pipeline as a guide, this chapter focuses on the items highlighted by the red, dashed borders in Figure 2-1.
Ingest Loading Layer
In order to perform analytics in Azure, you need to start by getting data into Azure in the first place. This is the point of the ingest phase. Ultimately, the goal is to get data from a source location (e.g., on premises or another cloud) into either file- or queue-based storage within Azure. In this context, we will look at the client tooling, processes, and protocols used to get the data to the destination in Azure.
To help put this layer in context, let’s refer back to the Blue Yonder Airlines scenario. They have historical flight delay data, historical weather data, and smart building telemetry upon which they wish to perform analytics. The first two data sets are candidates for bulk loading, which we will discuss next. The last data set, the smart building telemetry, is a candidate for streaming ingest, which we will examine later in the chapter.
The next chapter will dive into details of how data is stored once it lands in Azure, while ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access