Data Science Algorithms in a Week - Second Edition

Book description

Build a strong foundation of machine learning algorithms in 7 days

Key Features

  • Use Python and its wide array of machine learning libraries to build predictive models
  • Learn the basics of the 7 most widely used machine learning algorithms within a week
  • Know when and where to apply data science algorithms using this guide

Book Description

Machine learning applications are highly automated and self-modifying, and continue to improve over time with minimal human intervention, as they learn from the trained data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed. Through algorithmic and statistical analysis, these models can be leveraged to gain new knowledge from existing data as well.

Data Science Algorithms in a Week addresses all problems related to accurate and efficient data classification and prediction. Over the course of seven days, you will be introduced to seven algorithms, along with exercises that will help you understand different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. This book also guides you in predicting data based on existing trends in your dataset. This book covers algorithms such as k-nearest neighbors, Naive Bayes, decision trees, random forest, k-means, regression, and time-series analysis.

By the end of this book, you will understand how to choose machine learning algorithms for clustering, classification, and regression and know which is best suited for your problem

What you will learn

  • Understand how to identify a data science problem correctly
  • Implement well-known machine learning algorithms efficiently using Python
  • Classify your datasets using Naive Bayes, decision trees, and random forest with accuracy
  • Devise an appropriate prediction solution using regression
  • Work with time series data to identify relevant data events and trends
  • Cluster your data using the k-means algorithm

Who this book is for

This book is for aspiring data science professionals who are familiar with Python and have a little background in statistics. You'll also find this book useful if you're currently working with data science algorithms in some capacity and want to expand your skill set

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Data Science Algorithms in a Week Second Edition
  3. Packt Upsell
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Classification Using K-Nearest Neighbors
    1. Mary and her temperature preferences
    2. Implementation of the k-nearest neighbors algorithm
    3. Map of Italy example – choosing the value of k
      1. Analysis
    4. House ownership – data rescaling
      1. Analysis
    5. Text classification – using non-Euclidean distances
      1. Analysis
    6. Text classification – k-NN in higher dimensions
      1. Analysis
    7. Summary
    8. Problems
      1. Mary and her temperature preference problems
      2. Map of Italy – choosing the value of k
      3. House ownership
      4. Analysis
  7. Naive Bayes
    1. Medical tests – basic application of Bayes' theorem
      1. Analysis
    2. Bayes' theorem and its extension
      1. Bayes' theorem
        1. Proof
      2. Extended Bayes' theorem
        1. Proof
    3. Playing chess – independent events
      1. Analysis
    4. Implementation of a Naive Bayes classifier
    5. Playing chess – dependent events
      1. Analysis
    6. Gender classification – Bayes for continuous random variables
      1. Analysis
    7. Summary
    8. Problems
      1. Analysis
  8. Decision Trees
    1. Swim preference – representing data using a decision tree
    2. Information theory
      1. Information entropy
        1. Coin flipping
        2. Definition of information entropy
      2. Information gain
      3. Swim preference – information gain calculation
    3. ID3 algorithm – decision tree construction
      1. Swim preference – decision tree construction by the ID3 algorithm
      2. Implementation
    4. Classifying with a decision tree
      1. Classifying a data sample with the swimming preference decision tree
    5. Playing chess – analysis with a decision tree
      1. Analysis
        1. Classification
    6. Going shopping – dealing with data inconsistencies
      1. Analysis
    7. Summary
    8. Problems
      1. Analysis
  9. Random Forests
    1. Introduction to the random forest algorithm
      1. Overview of random forest construction
    2. Swim preference – analysis involving a random forest
      1. Analysis
        1. Random forest construction
          1. Construction of random decision tree number 0
          2. Construction of random decision tree number 1
          3. Constructed random forest
        2. Classification using random forest
    3. Implementation of the random forest algorithm
    4. Playing chess example
      1. Analysis
        1. Random forest construction
        2. Classification
    5. Going shopping – overcoming data inconsistencies with randomness and measuring the level of confidence
      1. Analysis
    6. Summary
    7. Problems
      1. Analysis
  10. Clustering into K Clusters
    1. Household incomes – clustering into k clusters
      1. K-means clustering algorithm
        1. Picking the initial k-centroids
        2. Computing a centroid of a given cluster
      2. Using the k-means clustering algorithm on the household income example
    2. Gender classification – clustering to classify
      1. Analysis
    3. Implementation of the k-means clustering algorithm
      1. Input data from gender classification
      2. Program output for gender classification data
    4. House ownership – choosing the number of clusters
      1. Analysis
    5. Document clustering – understanding the number of k clusters in a semantic context
      1. Analysis
    6. Summary
    7. Problems
      1. Analysis
  11. Regression
    1. Fahrenheit and Celsius conversion – linear regression on perfect data
      1. Analysis from first principles
      2. Least squares method for linear regression
      3. Analysis using the least squares method in Python
      4. Visualization
    2. Weight prediction from height – linear regression on real-world data
      1. Analysis
    3. Gradient descent algorithm and its implementation
      1. Gradient descent algorithm
      2. Implementation
      3. Visualization – comparison of the least squares method and the gradient descent algorithm
    4. Flight time duration prediction based on distance
      1. Analysis
    5. Ballistic flight analysis – non-linear model
      1. Analysis
        1. Analysis by using the least squares method in Python
    6. Summary
    7. Problems
      1. Analysis
  12. Time Series Analysis
    1. Business profits – analyzing trends
      1. Analysis
        1. Analyzing trends using the least squares method in Python
        2. Visualization
        3. Conclusion
    2. Electronics shop's sales – analyzing seasonality
      1. Analysis
        1. Analyzing trends using the least squares method in Python
        2. Visualization
        3. Analyzing seasonality
        4. Conclusion
    3. Summary
    4. Problems
      1. Analysis
  13. Python Reference
    1. Introduction
      1. Python Hello World example
      2. Comments
    2. Data types
      1. int
      2. float
      3. String
      4. Tuple
      5. List
      6. Set
      7. Dictionary
    3. Flow control
      1. Conditionals
      2. For loop
        1. For loop on range
        2. For loop on list
        3. Break and continue
      3. Functions
    4. Input and output
      1. Program arguments
      2. Reading and writing a file
  14. Statistics
    1. Basic concepts
    2. Bayesian inference
    3. Distributions
      1. Normal distribution
    4. Cross-validation
      1. K-fold cross-validation
    5. A/B testing
  15. Glossary of Algorithms and Methods in Data Science
  16. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Data Science Algorithms in a Week - Second Edition
  • Author(s): David Natingga
  • Release date: October 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789806076