O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Apache Mahout Classification

Book Description

Build and personalize your own classifiers using Apache Mahout

In Detail

This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.

Next, you will learn about different classification algorithms and models such as the Naïve Bayes algorithm, the Hidden Markov Model, and so on.

Finally, along with the examples that assist you in the creation of models, this book helps you to build a mail classification system that can be produced as soon as it is developed. After reading this book, you will be able to understand the concept of classification and the various algorithms along with the art of building your own classifiers.

What You Will Learn

  • Apply machine learning techniques in the area of classification
  • Categorize the unknown items by using the classification model in Apache Mahout
  • Use the classifier to classify text documents
  • Implement a multilayer perceptron to map sets of input to appropriate output sets
  • Develop the Hidden Markov model for a system with hidden states
  • Build and deploy an e-mail classifier that can predict the delivery of incoming mail

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Learning Apache Mahout Classification
    1. Table of Contents
    2. Learning Apache Mahout Classification
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Classification in Data Analysis
      1. Introducing the classification
        1. Application of the classification system
        2. Working of the classification system
      2. Classification algorithms
      3. Model evaluation techniques
        1. The confusion matrix
        2. The Receiver Operating Characteristics (ROC) graph
        3. Area under the ROC curve
        4. The entropy matrix
      4. Summary
    9. 2. Apache Mahout
      1. Introducing Apache Mahout
      2. Algorithms supported in Mahout
      3. Reasons for Mahout being a good choice for classification
      4. Installing Mahout
        1. Building Mahout from source using Maven
          1. Installing Maven
          2. Building Mahout code
        2. Setting up a development environment using Eclipse
        3. Setting up Mahout for a Windows user
      5. Summary
    10. 3. Learning Logistic Regression / SGD Using Mahout
      1. Introducing regression
        1. Understanding linear regression
          1. Cost function
          2. Gradient descent
        2. Logistic regression
        3. Stochastic Gradient Descent
        4. Using Mahout for logistic regression
      2. Summary
    11. 4. Learning the Naïve Bayes Classification Using Mahout
      1. Introducing conditional probability and the Bayes rule
      2. Understanding the Naïve Bayes algorithm
      3. Understanding the terms used in text classification
      4. Using the Naïve Bayes algorithm in Apache Mahout
      5. Summary
    12. 5. Learning the Hidden Markov Model Using Mahout
      1. Deterministic and nondeterministic patterns
      2. The Markov process
      3. Introducing the Hidden Markov Model
      4. Using Mahout for the Hidden Markov Model
      5. Summary
    13. 6. Learning Random Forest Using Mahout
      1. Decision tree
      2. Random forest
      3. Using Mahout for Random forest
        1. Steps to use the Random forest algorithm in Mahout
      4. Summary
    14. 7. Learning Multilayer Perceptron Using Mahout
      1. Neural network and neurons
      2. Multilayer Perceptron
      3. MLP implementation in Mahout
      4. Using Mahout for MLP
        1. Steps to use the MLP algorithm in Mahout
      5. Summary
    15. 8. Mahout Changes in the Upcoming Release
      1. Mahout new changes
        1. Mahout Scala and Spark bindings
      2. Apache Spark
        1. Using Mahout's Spark shell
      3. H2O platform integration
      4. Summary
    16. 9. Building an E-mail Classification System Using Apache Mahout
      1. Spam e-mail dataset
      2. Creating the model using the Assassin dataset
      3. Program to use a classifier model
      4. Testing the program
      5. Second use case as an exercise
        1. The ASF e-mail dataset
      6. Classifiers tuning
      7. Summary
    17. Index