O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning in Java - Second Edition

Book Description

Leverage the power of Java and its associated machine learning libraries to build powerful predictive models

Key Features

  • Solve predictive modeling problems using the most popular machine learning Java libraries
  • Explore data processing, machine learning, and NLP concepts using JavaML, WEKA, MALLET libraries
  • Practical examples, tips, and tricks to help you understand applied machine learning in Java

Book Description

As the amount of data in the world continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of big data and Data Science. The main challenge is how to transform data into actionable knowledge.

Machine Learning in Java will provide you with the techniques and tools you need. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. The code in this book works for JDK 8 and above, the code is tested on JDK 11.

Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will have explored related web resources and technologies that will help you take your learning to the next level.

By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.

What you will learn

  • Discover key Java machine learning libraries
  • Implement concepts such as classification, regression, and clustering
  • Develop a customer retention strategy by predicting likely churn candidates
  • Build a scalable recommendation engine with Apache Mahout
  • Apply machine learning to fraud, anomaly, and outlier detection
  • Experiment with deep learning concepts and algorithms
  • Write your own activity recognition model for eHealth applications

Who this book is for

If you want to learn how to use Java's machine learning libraries to gain insight from your data, this book is for you. It will get you up and running quickly and provide you with the skills you need to successfully create, customize, and deploy machine learning applications with ease. You should be familiar with Java programming and some basic data mining concepts to make the most of this book, but no prior experience with machine learning is required.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Machine Learning in Java Second Edition
  3. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  4. About Packt
    1. Why subscribe?
    2. Packt.com
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Applied Machine Learning Quick Start
    1. Machine learning and data science
      1. Solving problems with machine learning
      2. Applied machine learning workflow
    2. Data and problem definition
      1. Measurement scales
    3. Data collection
      1. Finding or observing data
      2. Generating data
      3. Sampling traps
    4. Data preprocessing
      1. Data cleaning
      2. Filling missing values
      3. Remove outliers
      4. Data transformation
      5. Data reduction
    5. Unsupervised learning
      1. Finding similar items
        1. Euclidean distances
        2. Non-Euclidean distances
        3. The curse of dimensionality
      2. Clustering
    6. Supervised learning
      1. Classification
        1. Decision tree learning
        2. Probabilistic classifiers
        3. Kernel methods
        4. Artificial neural networks
        5. Ensemble learning
        6. Evaluating classification
          1. Precision and recall
          2. Roc curves
      2. Regression
        1. Linear regression
        2. Logistic regression
        3. Evaluating regression
          1. Mean squared error
          2. Mean absolute error
          3. Correlation coefficient
    7. Generalization and evaluation
      1. Underfitting and overfitting
        1. Train and test sets
        2. Cross-validation
        3. Leave-one-out validation
        4. Stratification
    8. Summary
  7. Java Libraries and Platforms for Machine Learning
    1. The need for Java
    2. Machine learning libraries
      1. Weka
      2. Java machine learning
      3. Apache Mahout
      4. Apache Spark
      5. Deeplearning4j
      6. MALLET
      7. The Encog Machine Learning Framework
      8. ELKI
      9. MOA
      10. Comparing libraries
    3. Building a machine learning application
      1. Traditional machine learning architecture
      2. Dealing with big data
        1. Big data application architecture
    4. Summary
  8. Basic Algorithms - Classification, Regression, and Clustering
    1. Before you start
    2. Classification
      1. Data
      2. Loading data
      3. Feature selection
      4. Learning algorithms
      5. Classifying new data
      6. Evaluation and prediction error metrics
      7. The confusion matrix
      8. Choosing a classification algorithm
      9. Classification using Encog
      10. Classification using massive online analysis
        1. Evaluation
        2. Baseline classifiers
        3. Decision tree
        4. Lazy learning
        5. Active learning
    3. Regression
      1. Loading the data
      2. Analyzing attributes
      3. Building and evaluating the regression model
        1. Linear regression
          1. Linear regression using Encog
          2. Regression using MOA
        2. Regression trees
      4. Tips to avoid common regression problems
    4. Clustering
      1. Clustering algorithms
      2. Evaluation
      3. Clustering using Encog
      4. Clustering using ELKI
    5. Summary
  9. Customer Relationship Prediction with Ensembles
    1. The customer relationship database
      1. Challenge
      2. Dataset
      3. Evaluation
    2. Basic Naive Bayes classifier baseline
      1. Getting the data
      2. Loading the data
    3. Basic modeling
      1. Evaluating models
      2. Implementing the Naive Bayes baseline
    4. Advanced modeling with ensembles
      1. Before we start
      2. Data preprocessing
      3. Attribute selection
      4. Model selection
      5. Performance evaluation
      6. Ensemble methods – MOA
    5. Summary
  10. Affinity Analysis
    1. Market basket analysis
      1. Affinity analysis
    2. Association rule learning
      1. Basic concepts
        1. Database of transactions
        2. Itemset and rule
        3. Support
        4. Lift
        5. Confidence
      2. Apriori algorithm
      3. FP-Growth algorithm
    3. The supermarket dataset
    4. Discover patterns
      1. Apriori
      2. FP-Growth
    5. Other applications in various areas
      1. Medical diagnosis
      2. Protein sequences
      3. Census data
      4. Customer relationship management
      5. IT operations analytics
    6. Summary
  11. Recommendation Engines with Apache Mahout
    1. Basic concepts
      1. Key concepts
      2. User-based and item-based analysis
      3. Calculating similarity
        1. Collaborative filtering
        2. Content-based filtering
        3. Hybrid approach
      4. Exploitation versus exploration
    2. Getting Apache Mahout
      1. Configuring Mahout in Eclipse with the Maven plugin
    3. Building a recommendation engine
      1. Book ratings dataset
      2. Loading the data
        1. Loading data from a file
        2. Loading data from a database
        3. In-memory databases
      3. Collaborative filtering
        1. User-based filtering
        2. Item-based filtering
        3. Adding custom rules to recommendations
        4. Evaluation
        5. Online learning engine
    4. Content-based filtering
    5. Summary
  12. Fraud and Anomaly Detection
    1. Suspicious and anomalous behavior detection
      1. Unknown unknowns
    2. Suspicious pattern detection
    3. Anomalous pattern detection
      1. Analysis types
        1. Pattern analysis
      2. Transaction analysis
      3. Plan recognition
    4. Outlier detection using ELKI
      1. An example using ELKI
    5. Fraud detection in insurance claims
      1. Dataset
      2. Modeling suspicious patterns
        1. The vanilla approach
        2. Dataset rebalancing
    6. Anomaly detection in website traffic
      1. Dataset
      2. Anomaly detection in time series data
        1. Using Encog for time series
        2. Histogram-based anomaly detection
        3. Loading the data
        4. Creating histograms
        5. Density-based k-nearest neighbors
    7. Summary
  13. Image Recognition with Deeplearning4j
    1. Introducing image recognition
      1. Neural networks
        1. Perceptron
        2. Feedforward neural networks
        3. Autoencoder
        4. Restricted Boltzmann machine
        5. Deep convolutional networks
    2. Image classification
      1. Deeplearning4j
        1. Getting DL4J
      2. MNIST dataset
      3. Loading the data
      4. Building models
        1. Building a single-layer regression model
        2. Building a deep belief network
        3. Building a multilayer convolutional network
    3. Summary
  14. Activity Recognition with Mobile Phone Sensors
    1. Introducing activity recognition
      1. Mobile phone sensors
      2. Activity recognition pipeline
      3. The plan
    2. Collecting data from a mobile phone
      1. Installing Android Studio
      2. Loading the data collector
        1. Feature extraction
      3. Collecting training data
    3. Building a classifier
      1. Reducing spurious transitions
      2. Plugging the classifier into a mobile app
    4. Summary
  15. Text Mining with Mallet - Topic Modeling and Spam Detection
    1. Introducing text mining
      1. Topic modeling
      2. Text classification
    2. Installing Mallet
    3. Working with text data
      1. Importing data
        1. Importing from directory
        2. Importing from file
      2. Pre-processing text data
    4. Topic modeling for BBC News
      1. BBC dataset
      2. Modeling
      3. Evaluating a model
      4. Reusing a model
        1. Saving a model
        2. Restoring a model
    5. Detecting email spam 
      1. Email spam dataset
      2. Feature generation
      3. Training and testing
        1. Model performance
    6. Summary
  16. What Is Next?
    1. Machine learning in real life
      1. Noisy data
      2. Class unbalance
      3. Feature selection
      4. Model chaining
      5. The importance of evaluation
      6. Getting models into production
      7. Model maintenance
    2. Standards and markup languages
      1. CRISP-DM
      2. SEMMA methodology
      3. Predictive model markup language
    3. Machine learning in the cloud
      1. Machine learning as a service
    4. Web resources and competitions
      1. Datasets
      2. Online courses
      3. Competitions
      4. Websites and blogs
      5. Venues and conferences
    5. Summary
  17. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think