Hands-On Data Science with SQL Server 2017

Book description

Find, explore, and extract big data to transform into actionable insights

Key Features

  • Perform end-to-end data analysis - from exploration to visualization
  • Real-world examples, tasks, and interview queries to be a proficient data scientist
  • Understand how SQL is used for big data processing using HiveQL and SparkSQL

Book Description

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.

Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.

By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

What you will learn

  • Understand what data science is and how SQL Server is used for big data processing
  • Analyze incoming data with SQL queries and visualizations
  • Create, train, and evaluate predictive models
  • Make predictions using trained models and establish regular retraining courses
  • Incorporate data source querying into SQL Server
  • Enhance built-in T-SQL capabilities using SQLCLR
  • Visualize data with Reporting Services, Power View, and Power BI
  • Transform data with R, Python, and Azure

Who this book is for

Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Data Science with SQL Server 2017
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  6. Data Science Overview
    1. Introducing data science
      1. Data science project life cycle
        1. Business understanding
        2. Getting data
        3. Modelling and analysis
        4. Deployment and visualization 
        5. Final acceptance
    2. Data science domains
      1. Math and statistics
        1. Visualizing the types of data
        2. Statistics 101 
          1. Central tendency
          2. Skewness
          3. Variability
      2. Machine learning
        1. SQL Server and machine learning
          1. Choosing the right algorithm
      3. Big data
        1. SQL Server and big data
    3. Summary
  7. SQL Server 2017 as a Data Science Platform
    1. Technical requirements
    2. SQL Server evolution
      1. What's available in the pack?
      2. History of SQL Server
      3. SQL Server in the cloud
        1. Azure SQL Database
        2. Azure SQL Data Warehouse
    3. SQL Server Services and their use with data science
      1. SQL Server Integration Services
      2. SQL Server Analysis Services
        1. Tabular Mode
        2. Multidimensional mode
        3. PowerPivot Mode
        4. Querying languages
      3. Reporting Services
        1. Development tools for Reporting Services
      4. Power BI Report Server
      5. Machine Learning Services
    4. Summary
  8. Data Sources for Analytics
    1. Technical requirements
    2. Getting data from databases
      1. Importing data from SQL Server
      2. Importing data from other database systems
    3. Importing flat files
    4. Working with XML data
    5. Working with JSON 
      1. Retrieve data as JSON
      2. Processing stored JSON data
    6. External data with PolyBase
      1. Installing and configuring
    7. Summary
  9. Data Transforming and Cleaning with T-SQL
    1. Technical requirements
    2. The need for data transformation
    3. Database architectures for data transformations
      1. Direct source for data analysis
      2. Staging–target scenario
      3. Landing–staging–target scenario
      4. Tools eligible for data movement
        1. Distributed queries
        2. SQL Server Integration Services
          1. Why should we use SSIS?
          2. What is needed to develop an SSIS solution?
          3. Where should SSIS be used?
          4. Is there an alternative to SSIS?
    4. Transforming data
      1. Full data load
      2. Incremental data load
        1. The MERGE statement
        2. CHECKSUM
        3. Temporal tables
    5. Denormalizing data
      1. Relational normalization
        1. First normal form
        2. Second normal form
        3. Third normal form
      2. Need for denormalization
      3. Ways of denormalization
        1. Computed columns
        2. Denormalization using joins
    6. Using views and stored procedures
      1. Database applications
      2. Using views
      3. Using stored procedures
    7. Performance considerations
      1. Writing correct code
      2. Using indexes
        1. B-tree indexes
      3. COLUMNSTORE INDEX
    8. Summary
    9. Questions
  10. Data Exploration and Statistics with T-SQL
    1. Technical requirements
    2. T-SQL aggregate queries
      1. Common properties of aggregate functions
      2. Aggregate functions
        1. COUNT, COUNT(*), and COUNT_BIG
        2. MIN and MAX
        3. SUM
        4. AVG
        5. VAR and VARP
        6. STDEV and STDEVP
      3. Using groups
      4. Using the HAVING clause
    3. Ranking, framing, and windowing
      1. Ranking functions
        1. ROW_NUMBER
        2. RANK
        3. DENSE_RANK
        4. NTILE
    4. Running aggregates
      1. Using aggregate functions in running aggregates
        1. Using aggregate functions
        2. Using the LEAD and LAG functions
      2. Calculating with percentiles
        1. The PERCENT_RANK and CUME_DIST functions
        2. The PERCENTILE_CONT and PERCENTILE_DISC functions
    5. Summary
    6. Questions
  11. Custom Aggregations on SQL Server
    1. Technical requirements
    2. Overview of SQLCLR
      1. Use cases of using SQLCLR
      2. How to work with SQLCLR
      3. Instance and database configurations to use with SQLCLR
    3. Creating CLR aggregations
      1. Example goal and assignment
      2. Skeleton of CLR aggregation
      3. Implementing methods
        1. Implementing custom serialization
        2. Implementing the Init method
        3. Implementing the Accumulate method
        4. Implementing the Merge method
        5. Implementing the terminate method
      4. Deployment and testing
    4. Limitations and performance considerations
      1. Development issues and risks
      2. Maintenance issues and risks
      3. Performance issues and risks
    5. Summary
    6. Questions
  12. Data Visualization
    1. Technical requirements
    2. Data visualization – preparation phase
    3. Power BI Report Server
      1. Starting with Power BI Desktop
      2. Defining the data source
      3. Adding visualizations to the Report
      4. Visual interactions
      5. Publishing reports
    4. SQL Server Reporting Services
      1. Adding charts to Reports
      2. Using SQL Server Data Tools
    5. Summary
  13. Data Transformations with Other Tools
    1. Technical requirements
    2. Categorization, missing values, and normalization
      1. Categorization
      2. Missing values
      3. Normalization
        1. Z-score
        2. Feature-scaling
    3. Using Integration Services for data transformation
      1. Setting up a SSIS project
      2. Categorizing the products
    4. Using R for data transformation
      1. Preparing client R environment
      2. R Syntax first steps
      3. Working example of Z-score computed in R
    5. Using Data Factory for data transformation
      1. Creating Azure Data Factory
      2. Creating simple copy data with ADF
    6. Summary
    7. Questions
  14. Predictive Model Training and Evaluation
    1. Technical requirements
    2. Preparing SQL Server
      1. Setting up and configuring ML services
      2. Preparing to install our own R packages
    3. Creating data structures
      1. The concept of machine learning in databases
      2. Creating physical data structures
        1. Creating common objects
        2. Creating objects using filestreams
        3. Creating objects using temporal tables
    4. Deploying, training, and evaluating a predictive model
      1. Saving our machine learning model to filestreams
      2. Saving a machine learning model to temporal tables
    5. Summary
    6. Questions
  15. Making Predictions
    1. Technical requirements
    2. Reading models from a database
      1. Reading the model from a common table
      2. Reading the model from a temporal table
    3. Submitting values to an external script
      1. Submitting values into the external script
      2. Deserializing a predictive model
      3. Making the prediction
    4. Using the PREDICT keyword
    5. Making the predictive model self-training
      1. Re-calculating a predictive model regularly
      2. Re-calculating a predictive model asynchronously
        1. Creating a message type
        2. Creating a contract
        3. Creating queues and services
        4. Sending a request to train a new model
        5. Consuming requests and sending responses
        6. Testing the asynchronous solution
    6. Summary
    7. Questions
  16. Getting It All Together - A Real-World Example
    1. Technical requirements
    2. Assignment and preparation
      1. SQL Server
      2. Data description
    3. Data exploration
      1. Exploring data using T-SQL
      2. Exploring data using the SSIS Data Profiling Task
        1. Exploring the SourceData.Actions table
      3. Exploring data using R
    4. Data transformation
    5. Training and using predictive models for estimations
      1. Preparing the schema for the model
      2. Training the model
        1. Using the rxLinMod function and finishing the model
      3. Using the model in predictions
    6. Summary
    7. Questions
  17. Next Steps with Data Science and SQL
    1. Data science next steps
    2. Next steps with SQL Server
      1. Big data clusters
      2. Machine learning
        1. Machine learning services on Linux
        2. Machine learning high availability
    3. Data science in the cloud
    4. Summary
  18. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Data Science with SQL Server 2017
  • Author(s): Marek Chmel, Vladimir Muzny
  • Release date: November 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788996341