Chapter 7. Interactive Querying in Azure

In this chapter, we look at various techniques that are useful to achieving interactive query performance (Figure 7-1). For our purposes in this chapter, this means querying batch data at “human” and “humane” (pun intended) interactive speeds, which with the current generation of technologies means results are ready in time frames measured in seconds to minutes.

The fundamental concept introduced in this chapter that is universal to all the data stores we cover is understanding how to prune large data sets during query processing to achieve faster query execution. The concept seems fairly obvious—if you reduce the amount of data that the query engine has to read through, then your queries will be faster. Exactly how you reduce the data set size is where you get into the various techniques these data stores utilize, including:

Indexes

Creating indexes over the data can help the query engine identify the data files to include and which to skip over by consulting a separate set of data that represents the index, which is presumably significantly smaller than the source data set. In some cases, such as for data stored using ORC, indexes are automatically created and stored with the data files to aid in identifying complete files to exclude from processing as well as large segments of a file that can be ignored because they do not contain the values of interest.  

Partitions

All of the data stores in this chapter have a notion of a table ...

Get Mastering Azure Analytics, 1st Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.