Managing Data Science

Book description

Understand data science concepts and methodologies to manage and deliver top-notch solutions for your organization

Key Features

  • Learn the basics of data science and explore its possibilities and limitations
  • Manage data science projects and assemble teams effectively even in the most challenging situations
  • Understand management principles and approaches for data science projects to streamline the innovation process

Book Description

Data science and machine learning can transform any organization and unlock new opportunities. However, employing the right management strategies is crucial to guide the solution from prototype to production. Traditional approaches often fail as they don't entirely meet the conditions and requirements necessary for current data science projects. In this book, you'll explore the right approach to data science project management, along with useful tips and best practices to guide you along the way.

After understanding the practical applications of data science and artificial intelligence, you'll see how to incorporate them into your solutions. Next, you will go through the data science project life cycle, explore the common pitfalls encountered at each step, and learn how to avoid them. Any data science project requires a skilled team, and this book will offer the right advice for hiring and growing a data science team for your organization. Later, you'll be shown how to efficiently manage and improve your data science projects through the use of DevOps and ModelOps.

By the end of this book, you will be well versed with various data science solutions and have gained practical insights into tackling the different challenges that you'll encounter on a daily basis.

What you will learn

  • Understand the underlying problems of building a strong data science pipeline
  • Explore the different tools for building and deploying data science solutions
  • Hire, grow, and sustain a data science team
  • Manage data science projects through all stages, from prototype to production
  • Learn how to use ModelOps to improve your data science pipelines
  • Get up to speed with the model testing techniques used in both development and production stages

Who this book is for

This book is for data scientists, analysts, and program managers who want to use data science for business productivity by incorporating data science workflows efficiently. Some understanding of basic data science concepts will be useful to get the most out of this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Managing Data Science
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the color images
      2. Conventions used
    4. Get in touch
      1. Reviews
  7. Section 1: What is Data Science?
  8. What You Can Do with Data Science
    1. Defining AI
      1. Defining data science
      2. The influence of data science
      3. Limitations of data science
    2. Introduction to machine learning
      1. Decisions and insights provided by a machine learning model
      2. Data for machine learning models
      3. Origins of machine learning
      4. Anatomy of machine learning
      5. Main types of tasks you can solve with machine learning
    3. Introduction to deep learning
      1. Diving into natural language processing
      2. Exploring computer vision
    4. Deep learning use case
    5. Introduction to causal inference
    6. Summary
  9. Testing Your Models
    1. Offline model testing
      1. Understanding model errors
      2. Decomposing errors
        1. Understanding overfitting
      3. Using technical metrics
        1. More about imbalanced classes
      4. Applying business metrics
    2. Online model testing
      1. Online data testing
    3. Summary
  10. Understanding AI
    1. Understanding mathematical optimization
    2. Thinking with statistics
      1. Frequentist probabilities
      2. Conditional probabilities
        1. Dependent and independent events
      3. Bayesian view on probability
      4. Distributions
      5. Calculating statistics from data samples
      6. Statistical modeling
    3. How do machines learn?
    4. Exploring machine learning
      1. Defining goals of machine learning
        1. Using a life cycle to build machine learning models
      2. Linear models
      3. Classification and regression trees
      4. Ensemble models
        1. Tree-based ensembles
      5. Clustering models
    5. Exploring deep learning
      1. Building neural networks
      2. Introduction to computer vision
      3. Introduction to natural language processing
    6. Summary
  11. Section 2: Building and Sustaining a Team
  12. An Ideal Data Science Team
    1. Defining data science team roles
    2. Exploring data science team roles and their responsibilities
      1. Case study 1 – Applying machine learning to prevent fraud in banks
      2. Case study 2 – Finding a home for machine learning in a retail company
      3. Key skills of a data scientist
      4. Key skills of a data engineer
      5. Key skills of a data science manager
      6. Getting help from the development team
    3. Summary
  13. Conducting Data Science Interviews
    1. Common flaws of technical interviews
      1. Searching for candidates you don't need
      2. Discovering the purpose of the interview process
    2. Introducing values and ethics into the interview
    3. Designing good interviews
      1. Designing test assignments
      2. Interviewing for different data science roles
        1. General guidance
        2. Interviewing data scientists
        3. Interviewing data engineers
    4. Summary
  14. Building Your Data Science Team
    1. Achieving team Zen
    2. Leadership and people management
      1. Leading by example
      2. Using situational leadership
      3. Defining tasks in a clear way
      4. Developing empathy
    3. Facilitating a growth mindset
      1. Growing the expertise of your team as a whole
      2. Applying continuous learning for personal growth
      3. Giving more opportunities for learning
      4. Helping employees to grow with performance reviews
    4. Case study—creating a data science department
    5. Summary
  15. Section 3: Managing Various Data Science Projects
  16. Managing Innovation
    1. Understanding innovations
    2. Why do big organizations fail so often?
      1. Game of markets
      2. Creating new markets
    3. Exploring innovation management
      1. Case study – following the innovation cycle at MedVision
      2. Integrating innovations
    4. Balancing sales, marketing, team leadership, and technology
    5. Managing innovations in a big company
      1. Case study – bringing data science to a retail business
    6. Managing innovations in a start-up company
    7. Finding project ideas
      1. Finding ideas in business processes
      2. Finding ideas in data
        1. Case study – finding data science project ideas in an insurance company
    8. Summary
  17. Managing Data Science Projects
    1. Understanding data science project failure
      1. Understanding data science management approaches
    2. Exploring the data science project life cycle
      1. Business understanding
      2. Data understanding
      3. Data preparation
        1. Optimizing data preparation
      4. Modeling
      5. Evaluation
      6. Deployment
    3. Choosing a project management methodology
      1. Waterfall
      2. Agile
      3. Kanban
      4. Scrum
    4. Choosing a methodology that suits your project
      1. Creating disruptive innovation
      2. Providing a tested solution
      3. Developing a custom project for a customer
    5. Estimating data science projects
      1. Learning to make time and cost estimates
    6. Discovering the goals of the estimation process
    7. Summary
  18. Common Pitfalls of Data Science Projects
    1. Avoiding the common risks of data science projects
    2. Approaching research projects
    3. Dealing with prototypes and MVP projects
      1. Case study – creating an MVP in a consulting company
    4. Mitigating risks in production-oriented data science systems
      1. Case study – bringing a sales forecasting system into production
    5. Summary
  19. Creating Products and Improving Reusability
    1. Thinking of projects as products
    2. Determining the stage of your project
      1. Case study – building a service desk routing system
    3. Improving reusability
    4. Seeking and building products
      1. Privacy concerns
    5. Summary
  20. Section 4: Creating a Development Infrastructure
  21. Implementing ModelOps
    1. Understanding ModelOps
    2. Looking into DevOps
      1. Exploring the special needs of data science project infrastructure
      2. The data science delivery pipeline
    3. Managing code versions and quality
    4. Storing data along with the code
      1. Tracking and versioning data
      2. Storing data in practice
    5. Managing environments
    6. Tracking experiments
    7. The importance of automated testing
    8. Packaging code
    9. Continuous model training
    10. Case study – building ModelOps for a predictive maintenance system
    11. A power pack for your projects
    12. Summary
  22. Building Your Technology Stack
    1. Defining the elements of a technology stack
    2. Choosing between core- and project-specific technologies
    3. Comparing tools and products
      1. Case study – forecasting demand for a logistics company
    4. Summary
  23. Conclusion
    1. Advancing your knowledge
    2. Summary
  24. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Managing Data Science
  • Author(s): Kirill Dubovikov
  • Release date: November 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781838826321