Data Science for Decision Makers

Book description

Bridge the gap between business and data science by learning how to interpret machine learning and AI models, manage data teams, and achieve impactful results

Key Features

  • Master the concepts of statistics and ML to interpret models and guide decisions
  • Identify valuable AI use cases and manage data science projects from start to finish
  • Empower top data science teams to solve complex problems and build AI products
  • Purchase of the print Kindle book includes a free PDF eBook

Book Description

As data science and artificial intelligence (AI) become prevalent across industries, executives without formal education in statistics and machine learning, as well as data scientists moving into leadership roles, must learn how to make informed decisions about complex models and manage data teams. This book will elevate your leadership skills by guiding you through the core concepts of data science and AI.

This comprehensive guide is designed to bridge the gap between business needs and technical solutions, empowering you to make informed decisions and drive measurable value within your organization. Through practical examples and clear explanations, you'll learn how to collect and analyze structured and unstructured data, build a strong foundation in statistics and machine learning, and evaluate models confidently. By recognizing common pitfalls and valuable use cases, you'll plan data science projects effectively, from the ground up to completion. Beyond technical aspects, this book provides tools to recruit top talent, manage high-performing teams, and stay up to date with industry advancements.

By the end of this book, you’ll be able to characterize the data within your organization and frame business problems as data science problems.

What you will learn

  • Discover how to interpret common statistical quantities and make data-driven decisions
  • Explore ML concepts as well as techniques in supervised, unsupervised, and reinforcement learning
  • Find out how to evaluate statistical and machine learning models
  • Understand the data science lifecycle, from development to monitoring of models in production
  • Know when to use ML, statistical modeling, or traditional BI methods
  • Manage data teams and data science projects effectively

Who this book is for

This book is designed for executives who want to understand and apply data science methods to enhance decision-making. It is also for individuals who work with or manage data scientists and machine learning engineers, such as chief data officers (CDOs), data science managers, and technical project managers.

Table of contents

  1. Data Science for Decision Makers
  2. Contributors
  3. About the author
  4. About the reviewer
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. Conventions used
    4. Get in touch
    5. Share Your Thoughts
    6. Download a free PDF copy of this book
  6. Part 1: Understanding Data Science and Its Foundations
  7. Chapter 1: Introducing Data Science
    1. Data science, AI, and ML – what’s the difference?
      1. The mathematical and statistical underpinnings of data science
    2. Statistics and data science
      1. What is statistics?
    3. Descriptive and inferential statistics
      1. Sampling strategies
    4. Probability
      1. Probability distribution
      2. Conditional probability
    5. Describing our samples
      1. Measures of central tendency
      2. Measures of dispersion
      3. Degrees of freedom
      4. Correlation, causation, and covariance
      5. The shape of data
    6. Probability distributions
      1. Discrete probability distributions
      2. Continuous probability distributions
    7. Summary
  8. Chapter 2: Characterizing and Collecting Data
    1. What are the key criteria to consider when evaluating datasets?
      1. Data quantity
      2. Data velocity
      3. Data variety
      4. Data quality
    2. First-, second-, and third-party data
      1. First-party data – the treasure trove within
      2. Second-party data – building bridges through collaboration
      3. Third-party data – broadening horizons with external expertise
    3. Structured, unstructured, and semi-structured data
      1. Structured data
      2. Unstructured data
      3. Semi-structured data
    4. Methods for collecting data
    5. Storing and processing data
    6. Cloud, on-premises, and hybrid solutions – navigating the data storage and analysis landscape
      1. Cloud computing – scalable services in the cloud
      2. On-premises – maintaining control within your walls
      3. Hybrid – the best of both worlds?
    7. Data processing
    8. Summary
  9. Chapter 3: Exploratory Data Analysis
    1. Getting started with Google Colab
      1. What is Google Colab?
      2. A step-by-step guide to setting up Google Colab
    2. Understanding the data you have
    3. EDA techniques and tools
      1. Descriptive statistics
      2. Data visualization
      3. Histograms
      4. Density curves
      5. Boxplots
      6. Heatmaps
      7. Dimensionality reduction
      8. Correlation analysis
      9. Outlier detection
    4. Summary
  10. Chapter 4: The Significance of Significance
    1. The idea of testing hypotheses
      1. What is a hypothesis?
      2. How does hypothesis testing work?
      3. Formulating null and alternative hypotheses
      4. Determining the significance level
      5. Understanding errors
      6. Getting to grips with p-values
    2. Significance tests for a population proportion – making informed decisions about proportions
      1. The z-test – comparing a sample proportion to a population proportion
      2. Z-test example made easy
    3. Significance tests for a population average (mean)
      1. Writing hypotheses for a significance test about a mean
      2. Conditions for a t-test about a mean
      3. When to use z or t statistics in significance tests
      4. Example – calculating the t-statistic for a test about a mean
      5. Using a table to estimate the p-value from the t-statistic
      6. Comparing the p-value from the t-statistic to the significance level
      7. One-tailed and two-tailed tests
    4. Walking through a case study
    5. Summary
  11. Chapter 5: Understanding Regression
    1. How can I benefit from understanding regression?
    2. Introduction to trend lines
    3. Fitting a trend line to data
    4. Estimating the line of best fit
    5. Calculating the equations of the lines of best fit
    6. Interpreting the slope of a regression line
    7. Interpreting the intercept of a regression line
    8. Understanding residuals
    9. Evaluating the goodness of fit in least-squares regression
    10. Summary
  12. Part 2: Machine Learning – Concepts, Applications, and Pitfalls
  13. Chapter 6: Introducing Machine Learning
    1. From statistics to machine learning
      1. What is machine learning?
      2. How does machine learning relate to statistics?
    2. Why is machine learning important?
      1. Customer personalization and segmentation
      2. Fraud detection and security
      3. Supply chain and inventory optimization
      4. Predictive maintenance
      5. Healthcare diagnostics and treatment
    3. The different types of machine learning
      1. Supervised learning
      2. Unsupervised learning
      3. Semi-supervised learning
      4. Reinforcement learning
      5. Transfer learning
    4. Popular machine learning algorithms
      1. Linear regression
      2. Logistic regression
      3. Decision trees
      4. Random forests
      5. Support vector machines
      6. k-nearest neighbors
      7. Neural networks
    5. The machine learning process
      1. Training a supervised machine learning model
      2. Validation of a supervised machine learning model
      3. Testing a supervised machine learning model
      4. Evaluating machine learning models
    6. Risks and limitations of machine learning
      1. Overfitting and underfitting
      2. Bias and variance
      3. Balanced dataset
      4. Models are approximations of reality
    7. Machine learning on unstructured data
      1. Natural language processing (NLP)
      2. Computer vision
    8. Deep learning and artificial intelligence
      1. Artificial intelligence
      2. Deep learning
    9. Summary
  14. Chapter 7: Supervised Machine Learning
    1. Defining supervised learning
      1. Applications of supervised learning
      2. The two types of supervised learning
      3. Key factors in supervised learning
    2. Steps within supervised learning
      1. Data preparation – laying the foundation
      2. Algorithm selection – choosing the right tool
      3. Model training – learning from data
      4. Model evaluation – assessing performance
      5. Prediction and deployment – putting the model to work
    3. Characteristics of regression and classification algorithms
      1. Regression algorithms
      2. Classification algorithms
      3. Key considerations in supervised learning
      4. Evaluation metrics
    4. Applications of supervised learning
      1. Consumer goods
      2. Retail
      3. Manufacturing
    5. Summary
  15. Chapter 8: Unsupervised Machine Learning
    1. Defining UL
      1. Practical examples of UL
    2. Steps in UL
      1. Step 1 – Data collection
      2. Step 2 – Data preprocessing
      3. Step 3 – Choosing the right model
      4. Step 4 – Training the model
      5. Step 5 – Interpretation and evaluation
      6. In summary
    3. Clustering – unveiling hidden patterns in your data
      1. What is clustering?
      2. How does clustering work?
      3. k-means clustering
      4. Practical applications of clustering
      5. Evaluation metrics for clustering
      6. In summary
    4. Association rule learning
      1. What is association rule learning?
      2. The Apriori algorithm – a practical example
      3. Evaluation metrics
      4. In summary
    5. Applications of UL
      1. Market segmentation
      2. Anomaly detection
      3. Feature extraction
    6. Summary
  16. Chapter 9: Interpreting and Evaluating Machine Learning Models
    1. How do I know whether this model will be accurate?
      1. Evaluating on test (holdout) data
    2. Understanding evaluation metrics
      1. Evaluating regression models
      2. R-squared
      3. Root mean squared error
      4. Mean absolute error
      5. When and how to use each metric
      6. Practical evaluation strategies
      7. Summarizing the evaluation of regression models
    3. Evaluating classification models
      1. Classification model evaluation metrics
      2. Precision, recall, and F1-Score
      3. Recall
      4. F1-score
    4. Methods for explaining machine learning models
      1. Making sense of regression models – the power of coefficients
      2. Decoding classification models – unveiling feature importance
      3. Beyond specific models – universal insights using SHAP values
    5. Summary
  17. Chapter 10: Common Pitfalls in Machine Learning
    1. Understanding the complexity
    2. Dirty data, damaged models – how data quantity and quality impact ML
      1. The importance of adequate training data
      2. Dealing with poor data quality
      3. Conclusion
    3. Overcoming overfitting and underfitting
      1. Navigating training-serving skew and model drift
      2. Ensuring fairness
    4. Mastering overfitting and underfitting for optimal model performance
      1. Overfitting – when your model is too specific
      2. Underfitting – when your model is too simplistic
      3. Spotting the problem
      4. Conclusion
    5. Training-serving skew and model drift
      1. Training-serving skew
      2. Model drift
      3. Key takeaways
    6. Bias and fairness
      1. Understanding bias
      2. Understanding fairness
      3. Mitigating bias and ensuring fairness
      4. Key takeaways
    7. Summary
  18. Part 3: Leading Successful Data Science Projects and Teams
  19. Chapter 11: The Structure of a Data Science Project
    1. The various types of data science projects
      1. Data products
      2. Reports and analytics
      3. Research and methodology
    2. The stages of a data product
      1. Identifying use cases
      2. Evaluating use cases
      3. Planning the data product
    3. Developing a data product
      1. Data preparation and exploratory analysis
      2. Model design and development
      3. Evaluation and testing
    4. Deploying and monitoring a data product
    5. General best practices for data product development
    6. Evaluating impact
      1. Predictive maintenance in manufacturing
      2. Fraud detection in banking
      3. Customer churn prediction in telecom
      4. Demand forecasting in retail
      5. Personalized recommendations in e-commerce
      6. Predictive maintenance in energy
      7. Workforce optimization in quick service restaurants
      8. Chatbot-assisted customer support
    7. Summary
  20. Chapter 12: The Data Science Team
    1. Assembling your data science team – key roles and considerations
      1. Data scientists
      2. Machine learning engineers
      3. Data engineers
      4. MLOps engineers
      5. Analytics engineers
      6. Software engineers (full stack, frontend, backend)
      7. Product managers
      8. Business analysts
      9. Data storytellers/visualization experts
      10. Considerations when assembling your team
      11. Data science teams within larger organizations
    2. The hub and spoke model
      1. What is the hub and spoke model?
      2. Practical applications of the hub and spoke model
      3. Building a hub and spoke model
    3. The art of recruitment
      1. Where to find technical talent
    4. How high-performing data science teams operate
      1. Cross-functional collaboration is essential
      2. Diversity of perspectives drives innovation
      3. Start with the right problem to solve
      4. Invest in tooling, infrastructure, and workflow
      5. Continuous adaption and learning are a must
      6. Focus ruthlessly on outcomes over activity
    5. Summary
  21. Chapter 13: Managing the Data Science Team
    1. Day-to-day management of a data science team
      1. Enabling rapid experimentation and innovation
      2. Managing inherent uncertainty
      3. Balancing research and application
      4. Communicating effectively in data science and artificial intelligence
      5. Fostering a culture of curiosity and continuous learning
      6. Embracing peer review and collaboration
    2. Common challenges in managing a data science team
      1. Challenge 1 – recruiting and retaining top talent
      2. Challenge 2 – aligning projects with business goals
      3. Challenge 3 – managing inherent uncertainty
      4. Challenge 4 – scaling and operationalizing models
      5. Challenge 5 – deploying robust, reliable, fair models ethically
    3. Empowering and motivating your data science team
      1. Working with other teams and external stakeholders and empowering them to use data
    4. Summary
  22. Chapter 14: Continuing Your Journey as a Data Science Leader
    1. Navigating the landscape of emerging technologies
    2. Specializing in an industry
    3. Specializing in a field
    4. Embracing continuous learning
      1. Online courses
      2. Cloud certifications
      3. Technical tutorials and documentation
      4. Learning plan framework
    5. Staying up to date with current DS/ML/AI news and trends
    6. Promoting data-driven thinking within your organization
      1. Host internal learning sessions
      2. Collaborate on cross-functional projects
      3. Share success stories and lessons learned
      4. Mentor and upskill colleagues
      5. Establish a data science community of practice
    7. Networking beyond your organization
      1. Attend industry conferences and events
      2. Join online communities and forums
      3. Engage with local meetups and user groups
      4. Collaborate on side projects or research
      5. Offer mentorship or seek mentors
    8. Summary
  23. Index
    1. Why subscribe?
  24. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Data Science for Decision Makers
  • Author(s): Jon Howells
  • Release date: July 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781837637294