Chapter 2. What's In a Table? Getting Started with Data Exploration

The previous chapter introduced the SQL language from the perspective of data analysis. This chapter demonstrates the use of SQL for exploring data, the first step in any analysis project. The emphasis shifts from databases in general to data; understanding data — and the underlying customers — is a theme common to this chapter and the rest of the book.

The most common data analysis tool, by far, is the spreadsheet, particularly Microsoft Excel. Spreadsheets show users data in a tabular format. More importantly, spreadsheets give users power over their data, with the ability to add columns and rows, to apply functions, create charts, make pivot tables, and color and highlight and change fonts to get just the right look. This functionality and the what‐you‐see‐is‐what‐you‐get interface make spreadsheets a natural choice for analysis and presentation. Spreadsheets, however, are inherently less powerful than databases because they run on a single user's machine. Even without the historical limits in Excel on the number of rows (a maximum of 65,535 rows) and the number of columns (a maximum of 255 columns), the power of users' local machines limits the performance of spreadsheet applications.

This book assumes a basic understanding of Excel, particularly familiarity with the row‐column‐worksheet format used for laying out data. There are many examples of using Excel for basic calculations and charting. Because charts ...

Get Data Analysis Using SQL and Excel now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.