O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Science Algorithms in a Week

Book Description

Build strong foundation of machine learning algorithms In 7 days.

About This Book

  • Get to know seven algorithms for your data science needs in this concise, insightful guide
  • Ensure you’re confident in the basics by learning when and where to use various data science algorithms
  • Learn to use machine learning algorithms in a period of just 7 days

Who This Book Is For

This book is for aspiring data science professionals who are familiar with Python and have a statistics background. It is ideal for developers who are currently implementing one or two data science algorithms and want to learn more to expand their skill set.

What You Will Learn

  • Find out how to classify using Naive Bayes, Decision Trees, and Random Forest to achieve accuracy to solve complex problems
  • Identify a data science problem correctly and devise an appropriate prediction solution using Regression and Time-series
  • See how to cluster data using the k-Means algorithm
  • Get to know how to implement the algorithms efficiently in the Python and R languages

In Detail

Machine learning applications are highly automated and self-modifying, and they continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. Data science helps you gain new knowledge from existing data through algorithmic and statistical analysis.

This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.

This book covers algorithms such as: k-Nearest Neighbors, Naive Bayes, Decision Trees, Random Forest, k-Means, Regression, and Time-series. On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.

Style and approach

Machine learning applications are highly automated and self-modifying which continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Classification Using K Nearest Neighbors
    1. Mary and her temperature preferences
    2. Implementation of k-nearest neighbors algorithm
    3. Map of Italy example - choosing the value of k
    4. House ownership - data rescaling
    5. Text classification - using non-Euclidean distances
    6. Text classification - k-NN in higher-dimensions
    7. Summary
    8. Problems
  3. Naive Bayes
    1. Medical test - basic application of Bayes' theorem
    2. Proof of Bayes' theorem and its extension
      1. Extended Bayes' theorem
    3. Playing chess - independent events
    4. Implementation of naive Bayes classifier
    5. Playing chess - dependent events
    6. Gender classification - Bayes for continuous random variables
    7. Summary
    8. Problems
  4. Decision Trees
    1. Swim preference - representing data with decision tree
    2. Information theory
      1. Information entropy
        1. Coin flipping
        2. Definition of information entropy
      2. Information gain
      3. Swim preference - information gain calculation
    3. ID3 algorithm - decision tree construction
      1. Swim preference - decision tree construction by ID3 algorithm
      2. Implementation
    4. Classifying with a decision tree
      1. Classifying a data sample with the swimming preference decision tree
    5. Playing chess - analysis with decision tree
    6. Going shopping - dealing with data inconsistency
    7. Summary
    8. Problems
  5. Random Forest
    1. Overview of random forest algorithm
      1. Overview of random forest construction
    2. Swim preference - analysis with random forest
      1. Random forest construction
        1. Construction of random decision tree number 0
        2. Construction of random decision tree number 1
      2. Classification with random forest
    3. Implementation of random forest algorithm
    4. Playing chess example
      1. Random forest construction
        1. Construction of a random decision tree number 0:
        2. Construction of a random decision tree number 1, 2, 3
    5. Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence
    6. Summary
    7. Problems
  6. Clustering into K Clusters
    1. Household incomes - clustering into k clusters
      1. K-means clustering algorithm
        1. Picking the initial k-centroids
        2. Computing a centroid of a given cluster
      2. k-means clustering algorithm on household income example
    2. Gender classification - clustering to classify
    3. Implementation of the k-means clustering algorithm
      1. Input data from gender classification
      2. Program output for gender classification data
    4. House ownership – choosing the number of clusters
    5. Document clustering – understanding the number of clusters k in a semantic context
    6. Summary
    7. Problems
  7. Regression
    1. Fahrenheit and Celsius conversion - linear regression on perfect data
    2. Weight prediction from height - linear regression on real-world data
    3. Gradient descent algorithm and its implementation
      1. Gradient descent algorithm
      2. Visualization - comparison of models by R and gradient descent algorithm
    4. Flight time duration prediction from distance
    5. Ballistic flight analysis – non-linear model
    6. Summary
    7. Problems
  8. Time Series Analysis
    1. Business profit - analysis of the trend
    2. Electronics shop's sales - analysis of seasonality
      1. Analyzing trends using R
      2. Analyzing seasonality
        1. Conclusion
    3. Summary
    4. Problems
  9. Statistics
    1. Basic concepts
    2. Bayesian Inference
    3. Distributions
      1. Normal distribution
    4. Cross-validation
      1. K-fold cross-validation
    5. A/B Testing
  10. R Reference
    1. Introduction
      1. R Hello World example
    2. Data types
      1. Integer
      2. Numeric
      3. String
      4. List and vector
      5. Data frame
    3. Linear regression
  11. Python Reference
    1. Introduction
      1. Python Hello World example
    2. Data types
      1. Int
      2. Float
      3. String
      4. Tuple
      5. List
      6. Set
      7. Dictionary
    3. Flow control
      1. For loop
        1. For loop on range
        2. For loop on list
        3. Break and continue
      2. Functions
      3. Program arguments
      4. Reading and writing the file
  12. Glossary of Algorithms and Methods in Data Science