Data Science Bookcamp, video edition

Video description

In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.

Valuable and accessible… a solid foundation for anyone aspiring to be a data scientist.
Amaresh Rajasekharan, IBM Corporation

Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science.

In Data Science Bookcamp you will find:
  • Techniques for computing and plotting probabilities
  • Statistical analysis using Scipy
  • How to organize datasets with clustering algorithms
  • How to visualize complex multi-variable datasets
  • How to train a decision tree machine learning algorithm

In Data Science Bookcamp you’ll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career.

about the technology

A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data.

about the book

Data Science Bookcamp doesn’t stop with surface-level theory and toy examples. As you work through each project, you’ll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don’t quite fit the model you’re building. You’ll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you’ll be confident in your skills because you can see the results.

about the audience

For readers who know the basics of Python. No prior data science or machine learning skills required.

about the author

Leonard Apeltsin is the Head of Data Science at Anomaly, where his team applies advanced analytics to uncover healthcare fraud, waste, and abuse.

Really good introduction of statistical data science concepts. A must-have for every beginner!
Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland

A full-fledged tutorial in data science including common Python libraries and language tricks!
Jean-François Morin, Laval University

This book is a complete package for understanding how the data science process works end to end.
Ayon Roy, Internshala

NARRATED BY JULIE BRIERLEY

Table of contents

  1. Case study 1: Finding the winning strategy in a card game
  2. Chapter 1. Computing probabilities using Python This section covers
  3. Chapter 1. Problem 2: Analyzing multiple die rolls
  4. Chapter 2. Plotting probabilities using Matplotlib
  5. Chapter 2. Comparing multiple coin-flip probability distributions
  6. Chapter 3. Running random simulations in NumPy
  7. Chapter 3. Computing confidence intervals using histograms and NumPy arrays
  8. Chapter 3. Deriving probabilities from histograms
  9. Chapter 3. Computing histograms in NumPy
  10. Chapter 3. Using permutations to shuffle cards
  11. Chapter 4. Case study 1 solution
  12. Chapter 4. Optimizing strategies using the sample space for a 10-card deck
  13. Case study 2: Assessing online ad clicks for significance
  14. Chapter 5. Basic probability and statistical analysis using SciPy
  15. Chapter 5. Mean as a measure of centrality
  16. Chapter 5. Variance as a measure of dispersion
  17. Chapter 6. Making predictions using the central limit theorem and SciPy
  18. Chapter 6. Comparing two sampled normal curves
  19. Chapter 6. Determining the mean and variance of a population through random sampling
  20. Chapter 6. Computing the area beneath a normal curve
  21. Chapter 7. Statistical hypothesis testing
  22. Chapter 7. Assessing the divergence between sample mean and population mean
  23. Chapter 7. Data dredging: Coming to false conclusions through oversampling
  24. Chapter 7. Bootstrapping with replacement: Testing a hypothesis when the population variance is unknown 1
  25. Chapter 7. Bootstrapping with replacement: Testing a hypothesis when the population variance is unknown 2
  26. Chapter 7. Permutation testing: Comparing means of samples when the population parameters are unknown
  27. Chapter 8. Analyzing tables using Pandas
  28. Chapter 8. Retrieving table rows
  29. Chapter 8. Saving and loading table data
  30. Chapter 9. Case study 2 solution
  31. Chapter 9. Determining statistical significance
  32. Case study 3: Tracking disease outbreaks using news headlines
  33. Chapter 10. Clustering data into groups
  34. Chapter 10. K-means: A clustering algorithm for grouping data into K central groups
  35. Chapter 10. Using density to discover clusters
  36. Chapter 10. Clustering based on non-Euclidean distance
  37. Chapter 10. Analyzing clusters using Pandas
  38. Chapter 11. Geographic location visualization and analysis
  39. Chapter 11. Plotting maps using Cartopy
  40. Chapter 11. Visualizing maps
  41. Chapter 11. Location tracking using GeoNamesCache
  42. Chapter 11. Limitations of the GeoNamesCache library
  43. Chapter 12. Case study 3 solution
  44. Chapter 12. Visualizing and clustering the extracted location data
  45. Case study 4: Using online job postings to improve your data science resume
  46. Chapter 13. Measuring text similarities
  47. Chapter 13. Simple text comparison
  48. Chapter 13. Replacing words with numeric values
  49. Chapter 13. Vectorizing texts using word counts
  50. Chapter 13. Using normalization to improve TF vector similarity
  51. Chapter 13. Using unit vector dot products to convert between relevance metrics
  52. Chapter 13. Basic matrix operations, Part 1
  53. Chapter 13. Basic matrix operations, Part 2
  54. Chapter 13. Computational limits of matrix multiplication
  55. Chapter 14. Dimension reduction of matrix data
  56. Chapter 14. Reducing dimensions using rotation, Part 1
  57. Chapter 14. Reducing dimensions using rotation, Part 2
  58. Chapter 14. Dimension reduction using PCA and scikit-learn
  59. Chapter 14. Clustering 4D data in two dimensions
  60. Chapter 14. Limitations of PCA
  61. Chapter 14. Computing principal components without rotation
  62. Chapter 14. Extracting eigenvectors using power iteration, Part 1
  63. Chapter 14. Extracting eigenvectors using power iteration, Part 2
  64. Chapter 14. Efficient dimension reduction using SVD and scikit-learn
  65. Chapter 15. NLP analysis of large text datasets
  66. Chapter 15. Vectorizing documents using scikit-learn
  67. Chapter 15. Ranking words by both post frequency and count, Part 1
  68. Chapter 15. Ranking words by both post frequency and count, Part 2
  69. Chapter 15. Computing similarities across large document datasets
  70. Chapter 15. Clustering texts by topic, Part 1
  71. Chapter 15. Clustering texts by topic, Part 2
  72. Chapter 15. Visualizing text clusters
  73. Chapter 15. Using subplots to display multiple word clouds, Part 1
  74. Chapter 15. Using subplots to display multiple word clouds, Part 2
  75. Chapter 16. Extracting text from web pages
  76. Chapter 16. The structure of HTML documents
  77. Chapter 16. Parsing HTML using Beautiful Soup, Part 1
  78. Chapter 16. Parsing HTML using Beautiful Soup, Part 2
  79. Chapter 17. Case study 4 solution
  80. Chapter 17. Exploring the HTML for skill descriptions
  81. Chapter 17. Filtering jobs by relevance
  82. Chapter 17. Clustering skills in relevant job postings
  83. Chapter 17. Investigating the technical skill clusters
  84. Chapter 17. Exploring clusters at alternative values of K
  85. Chapter 17. Analyzing the 700 most relevant postings
  86. Case study 5: Predicting future friendships from social network data
  87. Chapter 18. An introduction to graph theory and network analysis
  88. Chapter 18. Analyzing web networks using NetworkX, Part 1
  89. Chapter 18. Analyzing web networks using NetworkX, Part 2
  90. Chapter 18. Utilizing undirected graphs to optimize the travel time between towns
  91. Chapter 18. Computing the fastest travel time between nodes, Part 1
  92. Chapter 18. Computing the fastest travel time between nodes, Part 2
  93. Chapter 19. Dynamic graph theory techniques for node ranking and social network analysis
  94. Chapter 19. Computing travel probabilities using matrix multiplication
  95. Chapter 19. Deriving PageRank centrality from probability theory
  96. Chapter 19. Computing PageRank centrality using NetworkX
  97. Chapter 19. Community detection using Markov clustering, Part 1
  98. Chapter 19. Community detection using Markov clustering, Part 2
  99. Chapter 19. Uncovering friend groups in social networks
  100. Chapter 20. Network-driven supervised machine learning
  101. Chapter 20. The basics of supervised machine learning
  102. Chapter 20. Measuring predicted label accuracy, Part 1
  103. Chapter 20. Measuring predicted label accuracy, Part 2
  104. Chapter 20. Optimizing KNN performance
  105. Chapter 20. Running a grid search using scikit-learn
  106. Chapter 20. Limitations of the KNN algorithm
  107. Chapter 21. Training linear classifiers with logistic regression
  108. Chapter 21. Training a linear classifier, Part 1
  109. Chapter 21. Training a linear classifier, Part 2
  110. Chapter 21. Improving linear classification with logistic regression, Part 1
  111. Chapter 21. Improving linear classification with logistic regression, Part 2
  112. Chapter 21. Training linear classifiers using scikit-learn
  113. Chapter 21. Measuring feature importance with coefficients
  114. Chapter 22. Training nonlinear classifiers with decision tree techniques
  115. Chapter 22. Training a nested if/else model using two features
  116. Chapter 22. Deciding which feature to split on
  117. Chapter 22. Training if/else models with more than two features
  118. Chapter 22. Training decision tree classifiers using scikit-learn
  119. Chapter 22. Studying cancerous cells using feature importance
  120. Chapter 22. Improving performance using random forest classification
  121. Chapter 22. Training random forest classifiers using scikit-learn
  122. Chapter 23. Case study 5 solution
  123. Chapter 23. Exploring the experimental observations
  124. Chapter 23. Training a predictive model using network features, Part 1
  125. Chapter 23. Training a predictive model using network features, Part 2
  126. Chapter 23. Adding profile features to the model
  127. Chapter 23. Optimizing performance across a steady set of features
  128. Chapter 23. Interpreting the trained model

Product information

  • Title: Data Science Bookcamp, video edition
  • Author(s): Leonard Apeltsin
  • Release date: November 2021
  • Publisher(s): Manning Publications
  • ISBN: None