SQL for Data Scientists

Book description

Jump-start your career as a data scientist—learn to develop datasets for exploration, analysis, and machine learning

SQL for Data Scientists:A Beginner's Guide for Building Datasets for Analysis is a resource that’s dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls.

You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data.

This guide for data scientists differs from other instructional guides on the subject. It doesn’t cover SQL broadly. Instead, you’ll learn the subset of SQL skills that data analysts and data scientists use frequently. You’ll also gain practical advice and direction on "how to think about constructing your dataset."

  • Gain an understanding of relational database structure, query design, and SQL syntax
  • Develop queries to construct datasets for use in applications like interactive reports and machine learning algorithms
  • Review strategies and approaches so you can design analytical datasets
  • Practice your techniques with the provided database and SQL code

In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner’s perspective, moving your data scientist career forward!

 

 

 

 

Table of contents

  1. Cover
  2. Title Page
  3. Introduction
    1. Who I Am and Why I'm Writing About This Topic
    2. Who This Book Is For
    3. Why You Should Learn SQL if You Want to Be a Data Scientist
    4. What I Hope You Gain from This Book
    5. Conventions
    6. Reader Support for This Book
  4. CHAPTER 1: Data Sources
    1. Data Sources
    2. Tools for Connecting to Data Sources and Editing SQL
    3. Relational Databases
    4. Dimensional Data Warehouses
    5. Asking Questions About the Data Source
    6. Introduction to the Farmer's Market Database
    7. A Note on Machine Learning Dataset Terminology
    8. Exercises
  5. CHAPTER 2: The SELECT Statement
    1. The SELECT Statement
    2. The Fundamental Syntax Structure of a SELECT Query
    3. Selecting Columns and Limiting the Number of Rows Returned
    4. The ORDER BY Clause: Sorting Results
    5. Introduction to Simple Inline Calculations
    6. More Inline Calculation Examples: Rounding
    7. More Inline Calculation Examples: Concatenating Strings
    8. Evaluating Query Output
    9. SELECT Statement Summary
    10. Exercises Using the Included Database
  6. CHAPTER 3: The WHERE Clause
    1. The WHERE Clause
    2. Filtering SELECT Statement Results
    3. Filtering on Multiple Conditions
    4. Multi-Column Conditional Filtering
    5. More Ways to Filter
    6. Filtering Using Subqueries
    7. Exercises Using the Included Database
  7. CHAPTER 4: CASE Statements
    1. CASE Statement Syntax
    2. Creating Binary Flags Using CASE
    3. Grouping or Binning Continuous Values Using CASE
    4. Categorical Encoding Using CASE
    5. CASE Statement Summary
    6. Exercises Using the Included Database
  8. CHAPTER 5: SQL JOINs
    1. Database Relationships and SQL JOINs
    2. A Common Pitfall when Filtering Joined Data
    3. JOINs with More than Two Tables
    4. Exercises Using the Included Database
  9. CHAPTER 6: Aggregating Results for Analysis
    1. GROUP BY Syntax
    2. Displaying Group Summaries
    3. Performing Calculations Inside Aggregate Functions
    4. MIN and MAX
    5. COUNT and COUNT DISTINCT
    6. Average
    7. Filtering with HAVING
    8. CASE Statements Inside Aggregate Functions
    9. Exercises Using the Included Database
  10. CHAPTER 7: Window Functions and Subqueries
    1. ROW NUMBER
    2. RANK and DENSE RANK
    3. NTILE
    4. Aggregate Window Functions
    5. LAG and LEAD
    6. Exercises Using the Included Database
  11. CHAPTER 8: Date and Time Functions
    1. Setting datetime Field Values
    2. EXTRACT and DATE_PART
    3. DATE_ADD and DATE_SUB
    4. DATEDIFF
    5. TIMESTAMPDIFF
    6. Date Functions in Aggregate Summaries and Window Functions
    7. Exercises
  12. CHAPTER 9: Exploratory Data Analysis with SQL
    1. Demonstrating Exploratory Data Analysis with SQL
    2. Exploring the Products Table
    3. Exploring Possible Column Values
    4. Exploring Changes Over Time
    5. Exploring Multiple Tables Simultaneously
    6. Exploring Inventory vs. Sales
    7. Exercises
  13. CHAPTER 10: Building SQL Datasets for Analytical Reporting
    1. Thinking Through Analytical Dataset Requirements
    2. Using Custom Analytical Datasets in SQL: CTEs and Views
    3. Taking SQL Reporting Further
    4. Exercises
  14. CHAPTER 11: More Advanced Query Structures
    1. UNIONs
    2. Self-Join to Determine To-Date Maximum
    3. Counting New vs. Returning Customers by Week
    4. Summary
    5. Exercises
  15. CHAPTER 12: Creating Machine Learning Datasets Using SQL
    1. Datasets for Time Series Models
    2. Datasets for Binary Classification
    3. Taking Things to the Next Level
    4. Exercises
  16. CHAPTER 13: Analytical Dataset Development Examples
    1. What Factors Correlate with Fresh Produce Sales?
    2. How Do Sales Vary by Customer Zip Code, Market Distance, and Demographic Data?
    3. How Does Product Price Distribution Affect Market Sales?
  17. CHAPTER 14: Storing and Modifying Data
    1. Storing SQL Datasets as Tables and Views
    2. Adding a Timestamp Column
    3. Inserting Rows and Updating Values in Database Tables
    4. Using SQL Inside Scripts
    5. In Closing
    6. Exercises
  18. APPENDIX: Answers to Exercises
    1. Chapter 1: Data Sources
    2. Chapter 2: The SELECT Statement
    3. Chapter 3: The WHERE Clause
    4. Chapter 4: CASE Statements
    5. Chapter 5: SQL JOINs
    6. Chapter 6: Aggregating Results for Analysis
    7. Chapter 7: Window Functions and Subqueries
    8. Chapter 8: Date and Time Functions
    9. Chapter 9: Exploratory Data Analysis with SQL
    10. Chapter 10: Building SQL Datasets for Analytical Reporting
    11. Chapter 11: More Advanced Query Structures
    12. Chapter 12: Creating Machine Learning Datasets Using SQL
    13. Chapter 14: Storing and Modifying Data
  19. Index
  20. Copyright
  21. Dedication
  22. About the Author
  23. About the Technical Editor
  24. Acknowledgments
  25. End User License Agreement

Product information

  • Title: SQL for Data Scientists
  • Author(s): Renee M. P. Teate
  • Release date: September 2021
  • Publisher(s): Wiley
  • ISBN: 9781119669364