O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning: End-to-End guide for Java developers

Book Description

Develop, Implement and Tuneup your Machine Learning applications using the power of Java programming

About This Book

  • Detailed coverage on key machine learning topics with an emphasis on both theoretical and practical aspects
  • Address predictive modeling problems using the most popular machine learning Java libraries
  • A comprehensive course covering a wide spectrum of topics such as machine learning and natural language through practical use-cases

Who This Book Is For

This course is the right resource for anyone with some knowledge of Java programming who wants to get started with Data Science and Machine learning as quickly as possible. If you want to gain meaningful insights from big data and develop intelligent applications using Java, this course is also a must-have.

What You Will Learn

  • Understand key data analysis techniques centered around machine learning
  • Implement Java APIs and various techniques such as classification, clustering, anomaly detection, and more
  • Master key Java machine learning libraries, their functionality, and various kinds of problems that can be addressed using each of them
  • Apply machine learning to real-world data for fraud detection, recommendation engines, text classification, and human activity recognition
  • Experiment with semi-supervised learning and stream-based data mining, building high-performing and real-time predictive models
  • Develop intelligent systems centered around various domains such as security, Internet of Things, social networking, and more

In Detail

Machine Learning is one of the core area of Artificial Intelligence where computers are trained to self-learn, grow, change, and develop on their own without being explicitly programmed. In this course, we cover how Java is employed to build powerful machine learning models to address the problems being faced in the world of Data Science. The course demonstrates complex data extraction and statistical analysis techniques supported by Java, applying various machine learning methods, exploring machine learning sub-domains, and exploring real-world use cases such as recommendation systems, fraud detection, natural language processing, and more, using Java programming. The course begins with an introduction to data science and basic data science tasks such as data collection, data cleaning, data analysis, and data visualization. The next section has a detailed overview of statistical techniques, covering machine learning, neural networks, and deep learning. The next couple of sections cover applying machine learning methods using Java to a variety of chores including classifying, predicting, forecasting, market basket analysis, clustering stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, and deep learning.

The last section highlights real-world test cases such as performing activity recognition, developing image recognition, text classification, and anomaly detection. The course includes premium content from three of our most popular books:

  • Java for Data Science
  • Machine Learning in Java
  • Mastering Java Machine Learning

On completion of this course, you will understand various machine learning techniques, different machine learning java algorithms you can use to gain data insights, building data models to analyze larger complex data sets, and incubating applications using Java and machine learning algorithms in the field of artificial intelligence.

Style and approach

This comprehensive course proceeds from being a tutorial to a practical guide, providing an introduction to machine learning and different machine learning techniques, exploring machine learning with Java libraries, and demonstrating real-world machine learning use cases using the Java platform.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Machine Learning: End-to-End guide for Java developers
    1. Table of Contents
    2. Machine Learning: End-to-End guide for Java developers
    3. Credits
    4. Preface
      1. What this learning path covers
      2. What you need for this learning path
      3. Who this learning path is for
      4. Reader feedback
      5. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    5. 1. Module 1
      1. 1. Getting Started with Data Science
        1. Problems solved using data science
        2. Understanding the data science problem -  solving approach
          1. Using Java to support data science
        3. Acquiring data for an application
        4. The importance and process of cleaning data
        5. Visualizing data to enhance understanding
        6. The use of statistical methods in data science
        7. Machine learning applied to data science
        8. Using neural networks in data science
        9. Deep learning approaches
        10. Performing text analysis
        11. Visual and audio analysis
        12. Improving application performance using parallel techniques
        13. Assembling the pieces
        14. Summary
      2. 2. Data Acquisition
        1. Understanding the data formats used in data science applications
          1. Overview of CSV data
          2. Overview of spreadsheets
          3. Overview of databases
          4. Overview of PDF files
          5. Overview of JSON
          6. Overview of XML
          7. Overview of streaming data
          8. Overview of audio/video/images in Java
        2. Data acquisition techniques
          1. Using the HttpUrlConnection class
          2. Web crawlers in Java
            1. Creating your own web crawler
            2. Using the crawler4j web crawler
          3. Web scraping in Java
          4. Using API calls to access common social media sites
            1. Using OAuth to authenticate users
            2. Handing Twitter
            3. Handling Wikipedia
            4. Handling Flickr
            5. Handling YouTube
              1. Searching by keyword
        3. Summary
      3. 3. Data Cleaning
        1. Handling data formats
          1. Handling CSV data
          2. Handling spreadsheets
            1. Handling Excel spreadsheets
          3. Handling PDF files
          4. Handling JSON
            1. Using JSON streaming API
            2. Using the JSON tree API
        2. The nitty gritty of cleaning text
          1. Using Java tokenizers to extract words
            1. Java core tokenizers
            2. Third-party tokenizers and libraries
          2. Transforming data into a usable form
            1. Simple text cleaning
            2. Removing stop words
          3. Finding words in text
            1. Finding and replacing text
          4. Data imputation
          5. Subsetting data
          6. Sorting text
          7. Data validation
            1. Validating data types
            2. Validating dates
            3. Validating e-mail addresses
            4. Validating ZIP codes
            5. Validating names
        3. Cleaning images
          1. Changing the contrast of an image
          2. Smoothing an image
          3. Brightening an image
          4. Resizing an image
          5. Converting images to different formats
        4. Summary
      4. 4. Data Visualization
        1. Understanding plots and graphs
          1. Visual analysis goals
        2. Creating index charts
        3. Creating bar charts
          1. Using country as the category
          2. Using decade as the category
        4. Creating stacked graphs
        5. Creating pie charts
        6. Creating scatter charts
        7. Creating histograms
        8. Creating donut charts
        9. Creating bubble charts
        10. Summary
      5. 5. Statistical Data Analysis Techniques
        1. Working with mean, mode, and median
          1. Calculating the mean
            1. Using simple Java techniques to find mean
            2. Using Java 8 techniques to find mean
            3. Using Google Guava to find mean
            4. Using Apache Commons to find mean
          2. Calculating the median
            1. Using simple Java techniques to find median
            2. Using Apache Commons to find the median
          3. Calculating the mode
            1. Using ArrayLists to find multiple modes
            2. Using a HashMap to find multiple modes
            3. Using a Apache Commons to find multiple modes
        2. Standard deviation
        3. Sample size determination
        4. Hypothesis testing
        5. Regression analysis
          1. Using simple linear regression
          2. Using multiple regression
        6. Summary
      6. 6. Machine Learning
        1. Supervised learning techniques
          1. Decision trees
            1. Decision tree types
            2. Decision tree libraries
            3. Using a decision tree with a book dataset
            4. Testing the book decision tree
          2. Support vector machines
            1. Using an SVM for camping data
            2. Testing individual instances
          3. Bayesian networks
            1. Using a Bayesian network
        2. Unsupervised machine learning
          1. Association rule learning
            1. Using association rule learning to find buying relationships
        3. Reinforcement learning
        4. Summary
      7. 7. Neural Networks
        1. Training a neural network
          1. Getting started with neural network architectures
        2. Understanding static neural networks
          1. A basic Java example
        3. Understanding dynamic neural networks
          1. Multilayer perceptron networks
            1. Building the model
            2. Evaluating the model
            3. Predicting other values
            4. Saving and retrieving the model
          2. Learning vector quantization
          3. Self-Organizing Maps
            1. Using a SOM
            2. Displaying the SOM results
        4. Additional network architectures and algorithms
          1. The k-Nearest Neighbors algorithm
          2. Instantaneously trained networks
          3. Spiking neural networks
          4. Cascading neural networks
          5. Holographic associative memory
          6. Backpropagation and neural networks
        5. Summary
      8. 8. Deep Learning
        1. Deeplearning4j architecture
          1. Acquiring and manipulating data
            1. Reading in a CSV file
          2. Configuring and building a model
            1. Using hyperparameters in ND4J
            2. Instantiating the network model
          3. Training a model
          4. Testing a model
        2. Deep learning and regression analysis
          1. Preparing the data
          2. Setting up the class
          3. Reading and preparing the data
          4. Building the model
          5. Evaluating the model
        3. Restricted Boltzmann Machines
          1. Reconstruction in an RBM
          2. Configuring an RBM
        4. Deep autoencoders
          1. Building an autoencoder in DL4J
            1. Configuring the network
            2. Building and training the network
            3. Saving and retrieving a network
            4. Specialized autoencoders
        5. Convolutional networks
          1. Building the model
          2. Evaluating the model
        6. Recurrent Neural Networks
        7. Summary
      9. 9. Text Analysis
        1. Implementing named entity recognition
          1. Using OpenNLP to perform NER
          2. Identifying location entities
        2. Classifying text
          1. Word2Vec and Doc2Vec
          2. Classifying text by labels
          3. Classifying text by similarity
        3. Understanding tagging and POS
          1. Using OpenNLP to identify POS
          2. Understanding POS tags
        4. Extracting relationships from sentences
          1. Using OpenNLP to extract relationships
        5. Sentiment analysis
          1. Downloading and extracting the Word2Vec model
          2. Building our model and classifying text
        6. Summary
      10. 10. Visual and Audio Analysis
        1. Text-to-speech
          1. Using FreeTTS
          2. Getting information about voices
          3. Gathering voice information
        2. Understanding speech recognition
          1. Using CMUPhinx to convert speech to text
          2. Obtaining more detail about the words
        3. Extracting text from an image
          1. Using Tess4j to extract text
        4. Identifying faces
          1. Using OpenCV to detect faces
        5. Classifying visual data
          1. Creating a Neuroph Studio project for classifying visual images
          2. Training the model
        6. Summary
      11. 11. Mathematical and Parallel Techniques for Data Analysis
        1. Implementing basic matrix operations
          1. Using GPUs with DeepLearning4j
        2. Using map-reduce
          1. Using Apache's Hadoop to perform map-reduce
          2. Writing the map method
          3. Writing the reduce method
          4. Creating and executing a new Hadoop job
        3. Various mathematical libraries
          1. Using the jblas API
          2. Using the Apache Commons math API
          3. Using the ND4J API
        4. Using OpenCL
        5. Using Aparapi
          1. Creating an Aparapi application
          2. Using Aparapi for matrix multiplication
        6. Using Java 8 streams
          1. Understanding Java 8 lambda expressions and streams
          2. Using Java 8 to perform matrix multiplication
          3. Using Java 8 to perform map-reduce
        7. Summary
      12. 12. Bringing It All Together
        1. Defining the purpose and scope of our application
        2. Understanding the application's architecture
        3. Data acquisition using Twitter
        4. Understanding the TweetHandler class
          1. Extracting data for a sentiment analysis model
          2. Building the sentiment model
          3. Processing the JSON input
          4. Cleaning data to improve our results
          5. Removing stop words
          6. Performing sentiment analysis
          7. Analysing the results
        5. Other optional enhancements
        6. Summary
    6. 2. Module 2
      1. 1. Applied Machine Learning Quick Start
        1. Machine learning and data science
          1. What kind of problems can machine learning solve?
          2. Applied machine learning workflow
        2. Data and problem definition
          1. Measurement scales
        3. Data collection
          1. Find or observe data
          2. Generate data
          3. Sampling traps
        4. Data pre-processing
          1. Data cleaning
          2. Fill missing values
          3. Remove outliers
          4. Data transformation
          5. Data reduction
        5. Unsupervised learning
          1. Find similar items
            1. Euclidean distances
            2. Non-Euclidean distances
            3. The curse of dimensionality
          2. Clustering
        6. Supervised learning
          1. Classification
            1. Decision tree learning
            2. Probabilistic classifiers
            3. Kernel methods
            4. Artificial neural networks
            5. Ensemble learning
            6. Evaluating classification
              1. Precision and recall
              2. Roc curves
          2. Regression
            1. Linear regression
            2. Evaluating regression
              1. Mean squared error
              2. Mean absolute error
              3. Correlation coefficient
        7. Generalization and evaluation
          1. Underfitting and overfitting
            1. Train and test sets
            2. Cross-validation
            3. Leave-one-out validation
            4. Stratification
        8. Summary
      2. 2. Java Libraries and Platforms for Machine Learning
        1. The need for Java
        2. Machine learning libraries
          1. Weka
          2. Java machine learning
          3. Apache Mahout
          4. Apache Spark
          5. Deeplearning4j
          6. MALLET
          7. Comparing libraries
        3. Building a machine learning application
          1. Traditional machine learning architecture
          2. Dealing with big data
            1. Big data application architecture
        4. Summary
      3. 3. Basic Algorithms – Classification, Regression, and Clustering
        1. Before you start
        2. Classification
          1. Data
          2. Loading data
          3. Feature selection
          4. Learning algorithms
          5. Classify new data
          6. Evaluation and prediction error metrics
          7. Confusion matrix
          8. Choosing a classification algorithm
        3. Regression
          1. Loading the data
          2. Analyzing attributes
          3. Building and evaluating regression model
            1. Linear regression
            2. Regression trees
          4. Tips to avoid common regression problems
        4. Clustering
          1. Clustering algorithms
          2. Evaluation
        5. Summary
      4. 4. Customer Relationship Prediction with Ensembles
        1. Customer relationship database
          1. Challenge
          2. Dataset
          3. Evaluation
        2. Basic naive Bayes classifier baseline
          1. Getting the data
          2. Loading the data
        3. Basic modeling
          1. Evaluating models
          2. Implementing naive Bayes baseline
        4. Advanced modeling with ensembles
          1. Before we start
          2. Data pre-processing
          3. Attribute selection
          4. Model selection
          5. Performance evaluation
        5. Summary
      5. 5. Affinity Analysis
        1. Market basket analysis
          1. Affinity analysis
        2. Association rule learning
          1. Basic concepts
            1. Database of transactions
            2. Itemset and rule
            3. Support
            4. Confidence
          2. Apriori algorithm
          3. FP-growth algorithm
        3. The supermarket dataset
        4. Discover patterns
          1. Apriori
          2. FP-growth
        5. Other applications in various areas
          1. Medical diagnosis
          2. Protein sequences
          3. Census data
          4. Customer relationship management
          5. IT Operations Analytics
        6. Summary
      6. 6. Recommendation Engine with Apache Mahout
        1. Basic concepts
          1. Key concepts
          2. User-based and item-based analysis
          3. Approaches to calculate similarity
            1. Collaborative filtering
            2. Content-based filtering
            3. Hybrid approach
          4. Exploitation versus exploration
        2. Getting Apache Mahout
          1. Configuring Mahout in Eclipse with the Maven plugin
        3. Building a recommendation engine
          1. Book ratings dataset
          2. Loading the data
            1. Loading data from file
            2. Loading data from database
            3. In-memory database
          3. Collaborative filtering
            1. User-based filtering
            2. Item-based filtering
            3. Adding custom rules to recommendations
            4. Evaluation
            5. Online learning engine
        4. Content-based filtering
        5. Summary
      7. 7. Fraud and Anomaly Detection
        1. Suspicious and anomalous behavior detection
          1. Unknown-unknowns
        2. Suspicious pattern detection
        3. Anomalous pattern detection
          1. Analysis types
            1. Pattern analysis
          2. Transaction analysis
          3. Plan recognition
        4. Fraud detection of insurance claims
          1. Dataset
          2. Modeling suspicious patterns
            1. Vanilla approach
            2. Dataset rebalancing
        5. Anomaly detection in website traffic
          1. Dataset
          2. Anomaly detection in time series data
            1. Histogram-based anomaly detection
            2. Loading the data
            3. Creating histograms
            4. Density based k-nearest neighbors
        6. Summary
      8. 8. Image Recognition with Deeplearning4j
        1. Introducing image recognition
          1. Neural networks
            1. Perceptron
            2. Feedforward neural networks
            3. Autoencoder
            4. Restricted Boltzmann machine
            5. Deep convolutional networks
        2. Image classification
          1. Deeplearning4j
            1. Getting DL4J
          2. MNIST dataset
          3. Loading the data
          4. Building models
            1. Building a single-layer regression model
            2. Building a deep belief network
            3. Build a Multilayer Convolutional Network
        3. Summary
      9. 9. Activity Recognition with Mobile Phone Sensors
        1. Introducing activity recognition
          1. Mobile phone sensors
          2. Activity recognition pipeline
          3. The plan
        2. Collecting data from a mobile phone
          1. Installing Android Studio
          2. Loading the data collector
            1. Feature extraction
          3. Collecting training data
        3. Building a classifier
          1. Reducing spurious transitions
          2. Plugging the classifier into a mobile app
        4. Summary
      10. 10. Text Mining with Mallet – Topic Modeling and Spam Detection
        1. Introducing text mining
          1. Topic modeling
          2. Text classification
        2. Installing Mallet
        3. Working with text data
          1. Importing data
            1. Importing from directory
            2. Importing from file
          2. Pre-processing text data
        4. Topic modeling for BBC news
          1. BBC dataset
          2. Modeling
          3. Evaluating a model
          4. Reusing a model
            1. Saving a model
            2. Restoring a model
        5. E-mail spam detection
          1. E-mail spam dataset
          2. Feature generation
          3. Training and testing
            1. Model performance
        6. Summary
      11. 11. What is Next?
        1. Machine learning in real life
          1. Noisy data
          2. Class unbalance
          3. Feature selection is hard
          4. Model chaining
          5. Importance of evaluation
          6. Getting models into production
          7. Model maintenance
        2. Standards and markup languages
          1. CRISP-DM
          2. SEMMA methodology
          3. Predictive Model Markup Language
        3. Machine learning in the cloud
          1. Machine learning as a service
        4. Web resources and competitions
          1. Datasets
          2. Online courses
          3. Competitions
          4. Websites and blogs
          5. Venues and conferences
        5. Summary
      12. A. References
    7. 3. Module 3
      1. 1. Machine Learning Review
        1. Machine learning – history and definition
        2. What is not machine learning?
        3. Machine learning – concepts and terminology
        4. Machine learning – types and subtypes
        5. Datasets used in machine learning
        6. Machine learning applications
        7. Practical issues in machine learning
        8. Machine learning – roles and process
          1. Roles
          2. Process
        9. Machine learning – tools and datasets
          1. Datasets
        10. Summary
      2. 2. Practical Approach to Real-World Supervised Learning
        1. Formal description and notation
          1. Data quality analysis
          2. Descriptive data analysis
            1. Basic label analysis
            2. Basic feature analysis
          3. Visualization analysis
            1. Univariate feature analysis
              1. Categorical features
              2. Continuous features
            2. Multivariate feature analysis
        2. Data transformation and preprocessing
          1. Feature construction
          2. Handling missing values
          3. Outliers
          4. Discretization
          5. Data sampling
            1. Is sampling needed?
            2. Undersampling and oversampling
              1. Stratified sampling
          6. Training, validation, and test set
        3. Feature relevance analysis and dimensionality reduction
          1. Feature search techniques
          2. Feature evaluation techniques
            1. Filter approach
              1. Univariate feature selection
                1. Information theoretic approach
                2. Statistical approach
              2. Multivariate feature selection
                1. Minimal redundancy maximal relevance (mRMR)
                2. Correlation-based feature selection (CFS)
            2. Wrapper approach
            3. Embedded approach
        4. Model building
          1. Linear models
            1. Linear Regression
              1. Algorithm input and output
              2. How does it work?
              3. Advantages and limitations
            2. Naïve Bayes
              1. Algorithm input and output
              2. How does it work?
              3. Advantages and limitations
            3. Logistic Regression
              1. Algorithm input and output
              2. How does it work?
              3. Advantages and limitations
          2. Non-linear models
            1. Decision Trees
              1. Algorithm inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. K-Nearest Neighbors (KNN)
              1. Algorithm inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            3. Support vector machines (SVM)
              1. Algorithm inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          3. Ensemble learning and meta learners
            1. Bootstrap aggregating or bagging
              1. Algorithm inputs and outputs
              2. How does it work?
                1. Random Forest
              3. Advantages and limitations
            2. Boosting
              1. Algorithm inputs and outputs
              2. How does it work?
              3. Advantages and limitations
        5. Model assessment, evaluation, and comparisons
          1. Model assessment
          2. Model evaluation metrics
            1. Confusion matrix and related metrics
            2. ROC and PRC curves
            3. Gain charts and lift curves
          3. Model comparisons
            1. Comparing two algorithms
              1. McNemar's Test
                1. Paired-t test
              2. Wilcoxon signed-rank test
            2. Comparing multiple algorithms
              1. ANOVA test
              2. Friedman's test
        6. Case Study – Horse Colic Classification
          1. Business problem
          2. Machine learning mapping
          3. Data analysis
            1. Label analysis
              1. Features analysis
          4. Supervised learning experiments
            1. Weka experiments
              1. Sample end-to-end process in Java
              2. Weka experimenter and model selection
            2. RapidMiner experiments
              1. Visualization analysis
              2. Feature selection
              3. Model process flow
              4. Model evaluation metrics
                1. Evaluation on Confusion Metrics
          5. Results, observations, and analysis
        7. Summary
        8. References
      3. 3. Unsupervised Machine Learning Techniques
        1. Issues in common with supervised learning
        2. Issues specific to unsupervised learning
        3. Feature analysis and dimensionality reduction
          1. Notation
          2. Linear methods
            1. Principal component analysis (PCA)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. Random projections (RP)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            3. Multidimensional Scaling (MDS)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          3. Nonlinear methods
            1. Kernel Principal Component Analysis (KPCA)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. Manifold learning
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
        4. Clustering
          1. Clustering algorithms
            1. k-Means
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. DBSCAN
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            3. Mean shift
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            4. Expectation maximization (EM) or Gaussian mixture modeling (GMM)
              1. Input and output
              2. How does it work?
              3. Advantages and limitations
            5. Hierarchical clustering
              1. Input and output
              2. How does it work?
              3. Advantages and limitations
            6. Self-organizing maps (SOM)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          2. Spectral clustering
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          3. Affinity propagation
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          4. Clustering validation and evaluation
            1. Internal evaluation measures
              1. Notation
              2. R-Squared
              3. Dunn's Indices
              4. Davies-Bouldin index
                1. Silhouette's index
            2. External evaluation measures
              1. Rand index
              2. F-Measure
              3. Normalized mutual information index
        5. Outlier or anomaly detection
          1. Outlier algorithms
            1. Statistical-based
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. Distance-based methods
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            3. Density-based methods
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            4. Clustering-based methods
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            5. High-dimensional-based methods
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            6. One-class SVM
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          2. Outlier evaluation techniques
            1. Supervised evaluation
            2. Unsupervised evaluation
        6. Real-world case study
          1. Tools and software
          2. Business problem
          3. Machine learning mapping
          4. Data collection
          5. Data quality analysis
          6. Data sampling and transformation
          7. Feature analysis and dimensionality reduction
            1. PCA
            2. Random projections
            3. ISOMAP
            4. Observations on feature analysis and dimensionality reduction
          8. Clustering models, results, and evaluation
            1. Observations and clustering analysis
          9. Outlier models, results, and evaluation
            1. Observations and analysis
        7. Summary
        8. References
      4. 4. Semi-Supervised and Active Learning
        1. Semi-supervised learning
          1. Representation, notation, and assumptions
          2. Semi-supervised learning techniques
            1. Self-training SSL
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            2. Co-training SSL or multi-view SSL
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            3. Cluster and label SSL
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            4. Transductive graph label propagation
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            5. Transductive SVM (TSVM)
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
          3. Case study in semi-supervised learning
            1. Tools and software
            2. Business problem
            3. Machine learning mapping
            4. Data collection
              1. Data quality analysis
            5. Data sampling and transformation
            6. Datasets and analysis
              1. Feature analysis results
            7. Experiments and results
              1. Analysis of semi-supervised learning
        2. Active learning
          1. Representation and notation
          2. Active learning scenarios
          3. Active learning approaches
            1. Uncertainty sampling
              1. How does it work?
                1. Least confident sampling
                2. Smallest margin sampling
                3. Label entropy sampling
              2. Advantages and limitations
          4. Version space sampling
            1. Query by disagreement (QBD)
              1. How does it work?
                1. Query by Committee (QBC)
              2. How does it work?
          5. Advantages and limitations
          6. Data distribution sampling
            1. How does it work?
              1. Expected model change
              2. Expected error reduction
                1. Variance reduction
                2. Density weighted methods
          7. Advantages and limitations
        3. Case study in active learning
          1. Tools and software
          2. Business problem
          3. Machine learning mapping
          4. Data Collection
          5. Data sampling and transformation
          6. Feature analysis and dimensionality reduction
          7. Models, results, and evaluation
            1. Pool-based scenarios
            2. Stream-based scenarios
          8. Analysis of active learning results
        4. Summary
        5. References
      5. 5. Real-Time Stream Machine Learning
        1. Assumptions and mathematical notations
        2. Basic stream processing and computational techniques
          1. Stream computations
          2. Sliding windows
          3. Sampling
        3. Concept drift and drift detection
          1. Data management
          2. Partial memory
            1. Full memory
            2. Detection methods
              1. Monitoring model evolution
                1. Widmer and Kubat
                2. Drift Detection Method or DDM
                3. Early Drift Detection Method or EDDM
              2. Monitoring distribution changes
                1. Welch's t test
                  1. Kolmogorov-Smirnov's test
                  2. CUSUM and Page-Hinckley test
            3. Adaptation methods
              1. Explicit adaptation
              2. Implicit adaptation
        4. Incremental supervised learning
          1. Modeling techniques
            1. Linear algorithms
              1. Online linear models with loss functions
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
              2. Online Naïve Bayes
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
            2. Non-linear algorithms
              1. Hoeffding trees or very fast decision trees (VFDT)
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
            3. Ensemble algorithms
              1. Weighted majority algorithm
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
              2. Online Bagging algorithm
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
              3. Online Boosting algorithm
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
          2. Validation, evaluation, and comparisons in online setting
            1. Model validation techniques
              1. Prequential evaluation
              2. Holdout evaluation
              3. Controlled permutations
              4. Evaluation criteria
              5. Comparing algorithms and metrics
        5. Incremental unsupervised learning using clustering
          1. Modeling techniques
            1. Partition based
              1. Online k-Means
                1. Inputs and outputs
                2. How does it work?
                3. Advantages and limitations
            2. Hierarchical based and micro clustering
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
              4. Inputs and outputs
              5. How does it work?
              6. Advantages and limitations
            3. Density based
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            4. Grid based
              1. Inputs and outputs
              2. How does it work?
              3. Advantages and limitations
            5. Validation and evaluation techniques
              1. Key issues in stream cluster evaluation
              2. Evaluation measures
                1. Cluster Mapping Measures (CMM)
                2. V-Measure
                3. Other external measures
        6. Unsupervised learning using outlier detection
          1. Partition-based clustering for outlier detection
            1. Inputs and outputs
            2. How does it work?
            3. Advantages and limitations
          2. Distance-based clustering for outlier detection
            1. Inputs and outputs
            2. How does it work?
              1. Exact Storm
              2. Abstract-C
              3. Direct Update of Events (DUE)
              4. Micro Clustering based Algorithm (MCOD)
              5. Approx Storm
                1. Advantages and limitations
            3. Validation and evaluation techniques
        7. Case study in stream learning
          1. Tools and software
          2. Business problem
          3. Machine learning mapping
          4. Data collection
          5. Data sampling and transformation
            1. Feature analysis and dimensionality reduction
          6. Models, results, and evaluation
            1. Supervised learning experiments
              1. Concept drift experiments
            2. Clustering experiments
            3. Outlier detection experiments
          7. Analysis of stream learning results
        8. Summary
        9. References
      6. 6. Probabilistic Graph Modeling
        1. Probability revisited
          1. Concepts in probability
            1. Conditional probability
            2. Chain rule and Bayes' theorem
            3. Random variables, joint, and marginal distributions
            4. Marginal independence and conditional independence
            5. Factors
              1. Factor types
            6. Distribution queries
              1. Probabilistic queries
              2. MAP queries and marginal MAP queries
        2. Graph concepts
          1. Graph structure and properties
          2. Subgraphs and cliques
          3. Path, trail, and cycles
        3. Bayesian networks
          1. Representation
            1. Definition
            2. Reasoning patterns
              1. Causal or predictive reasoning
              2. Evidential or diagnostic reasoning
              3. Intercausal reasoning
              4. Combined reasoning
            3. Independencies, flow of influence, D-Separation, I-Map
              1. Flow of influence
              2. D-Separation
              3. I-Map
          2. Inference
            1. Elimination-based inference
              1. Variable elimination algorithm
                1. Input and output
                2. How does it work?
                3. Advantages and limitations
              2. Clique tree or junction tree algorithm
                1. Input and output
                2. How does it work?
                3. Advantages and limitations
            2. Propagation-based techniques
              1. Belief propagation
                1. Factor graph
                2. Messaging in factor graph
                3. Input and output
                4. How does it work?
                5. Advantages and limitations
            3. Sampling-based techniques
              1. Forward sampling with rejection
                1. Input and output
                2. How does it work?
                3. Advantages and limitations
          3. Learning
            1. Learning parameters
              1. Maximum likelihood estimation for Bayesian networks
              2. Bayesian parameter estimation for Bayesian network
                1. Prior and posterior using the Dirichlet distribution
            2. Learning structures
              1. Measures to evaluate structures
              2. Methods for learning structures
                1. Constraint-based techniques
                  1. Inputs and outputs
                  2. How does it work?
                  3. Advantages and limitations
                2. Search and score-based techniques
                  1. Inputs and outputs
                  2. How does it work?
                  3. Advantages and limitations
        4. Markov networks and conditional random fields
          1. Representation
            1. Parameterization
              1. Gibbs parameterization
              2. Factor graphs
              3. Log-linear models
            2. Independencies
              1. Global
              2. Pairwise Markov
                1. Markov blanket
          2. Inference
          3. Learning
          4. Conditional random fields
        5. Specialized networks
          1. Tree augmented network
            1. Input and output
            2. How does it work?
            3. Advantages and limitations
          2. Markov chains
            1. Hidden Markov models
            2. Most probable path in HMM
            3. Posterior decoding in HMM
        6. Tools and usage
          1. OpenMarkov
          2. Weka Bayesian Network GUI
        7. Case study
          1. Business problem
          2. Machine learning mapping
          3. Data sampling and transformation
          4. Feature analysis
          5. Models, results, and evaluation
          6. Analysis of results
        8. Summary
        9. References
      7. 7. Deep Learning
        1. Multi-layer feed-forward neural network
          1. Inputs, neurons, activation function, and mathematical notation
          2. Multi-layered neural network
            1. Structure and mathematical notations
            2. Activation functions in NN
              1. Sigmoid function
              2. Hyperbolic tangent ("tanh") function
            3. Training neural network
              1. Empirical risk minimization
                1. Parameter initialization
                2. Loss function
                3. Gradients
                  1. Gradient at the output layer
                  2. Gradient at the Hidden Layer
                  3. Parameter gradient
                4. Feed forward and backpropagation
                5. How does it work?
                6. Regularization
                  1. L2 regularization
                  2. L1 regularization
        2. Limitations of neural networks
          1. Vanishing gradients, local optimum, and slow training
        3. Deep learning
          1. Building blocks for deep learning
            1. Rectified linear activation function
            2. Restricted Boltzmann Machines
              1. Definition and mathematical notation
              2. Conditional distribution
              3. Free energy in RBM
              4. Training the RBM
              5. Sampling in RBM
              6. Contrastive divergence
                1. Inputs and outputs
                2. How does it work?
              7. Persistent contrastive divergence
            3. Autoencoders
              1. Definition and mathematical notations
              2. Loss function
              3. Limitations of Autoencoders
              4. Denoising Autoencoder
            4. Unsupervised pre-training and supervised fine-tuning
            5. Deep feed-forward NN
              1. Input and outputs
              2. How does it work?
            6. Deep Autoencoders
            7. Deep Belief Networks
              1. Inputs and outputs
              2. How does it work?
            8. Deep learning with dropouts
              1. Definition and mathematical notation
              2. Inputs and outputs
                1. How does it work?
              3. Learning Training and testing with dropouts
            9. Sparse coding
            10. Convolutional Neural Network
              1. Local connectivity
              2. Parameter sharing
              3. Discrete convolution
              4. Pooling or subsampling
              5. Normalization using ReLU
            11. CNN Layers
            12. Recurrent Neural Networks
              1. Structure of Recurrent Neural Networks
              2. Learning and associated problems in RNNs
              3. Long Short Term Memory
              4. Gated Recurrent Units
        4. Case study
          1. Tools and software
          2. Business problem
          3. Machine learning mapping
          4. Data sampling and transfor
          5. Feature analysis
          6. Models, results, and evaluation
            1. Basic data handling
            2. Multi-layer perceptron
              1. Parameters used for MLP
              2. Code for MLP
            3. Convolutional Network
              1. Parameters used for ConvNet
              2. Code for CNN
            4. Variational Autoencoder
              1. Parameters used for the Variational Autoencoder
              2. Code for Variational Autoencoder
            5. DBN
            6. Parameter search using Arbiter
            7. Results and analysis
        5. Summary
        6. References
      8. 8. Text Mining and Natural Language Processing
        1. NLP, subfields, and tasks
          1. Text categorization
          2. Part-of-speech tagging (POS tagging)
          3. Text clustering
          4. Information extraction and named entity recognition
          5. Sentiment analysis and opinion mining
          6. Coreference resolution
          7. Word sense disambiguation
          8. Machine translation
          9. Semantic reasoning and inferencing
          10. Text summarization
          11. Automating question and answers
        2. Issues with mining unstructured data
        3. Text processing components and transformations
          1. Document collection and standardization
            1. Inputs and outputs
            2. How does it work?
          2. Tokenization
            1. Inputs and outputs
            2. How does it work?
          3. Stop words removal
            1. Inputs and outputs
            2. How does it work?
          4. Stemming or lemmatization
            1. Inputs and outputs
            2. How does it work?
          5. Local/global dictionary or vocabulary?
          6. Feature extraction/generation
            1. Lexical features
              1. Character-based features
              2. Word-based features
              3. Part-of-speech tagging features
              4. Taxonomy features
            2. Syntactic features
            3. Semantic features
          7. Feature representation and similarity
            1. Vector space model
              1. Binary
              2. Term frequency (TF)
              3. Inverse document frequency (IDF)
              4. Term frequency-inverse document frequency (TF-IDF)
            2. Similarity measures
              1. Euclidean distance
              2. Cosine distance
              3. Pairwise-adaptive similarity
              4. Extended Jaccard coefficient
              5. Dice coefficient
          8. Feature selection and dimensionality reduction
            1. Feature selection
              1. Information theoretic techniques
              2. Statistical-based techniques
              3. Frequency-based techniques
            2. Dimensionality reduction
        4. Topics in text mining
          1. Text categorization/classification
          2. Topic modeling
            1. Probabilistic latent semantic analysis (PLSA)
              1. Input and output
              2. How does it work?
              3. Advantages and limitations
          3. Text clustering
            1. Feature transformation, selection, and reduction
            2. Clustering techniques
              1. Generative probabilistic models
                1. Input and output
                2. How does it work?
                3. Advantages and limitations
              2. Distance-based text clustering
              3. Non-negative matrix factorization (NMF)
                1. Input and output
                2. How does it work?
                3. Advantages and limitations
            3. Evaluation of text clustering
          4. Named entity recognition
            1. Hidden Markov models for NER
              1. Input and output
              2. How does it work?
              3. Advantages and limitations
            2. Maximum entropy Markov models for NER
              1. Input and output
              2. How does it work?
              3. Advantages and limitations
          5. Deep learning and NLP
        5. Tools and usage
          1. Mallet
          2. KNIME
          3. Topic modeling with mallet
          4. Business problem
          5. Machine Learning mapping
          6. Data collection
          7. Data sampling and transformation
          8. Feature analysis and dimensionality reduction
          9. Models, results, and evaluation
          10. Analysis of text processing results
        6. Summary
        7. References
      9. 9. Big Data Machine Learning – The Final Frontier
        1. What are the characteristics of Big Data?
        2. Big Data Machine Learning
          1. General Big Data framework
            1. Big Data cluster deployment frameworks
              1. Hortonworks Data Platform
              2. Cloudera CDH
              3. Amazon Elastic MapReduce
              4. Microsoft Azure HDInsight
            2. Data acquisition
              1. Publish-subscribe frameworks
              2. Source-sink frameworks
              3. SQL frameworks
              4. Message queueing frameworks
              5. Custom frameworks
            3. Data storage
              1. HDFS
              2. NoSQL
                1. Key-value databases
                2. Document databases
                3. Columnar databases
                4. Graph databases
            4. Data processing and preparation
              1. Hive and HQL
              2. Spark SQL
              3. Amazon Redshift
              4. Real-time stream processing
            5. Machine Learning
            6. Visualization and analysis
        3. Batch Big Data Machine Learning
          1. H2O as Big Data Machine Learning platform
            1. H2O architecture
            2. Machine learning in H2O
            3. Tools and usage
        4. Case study
          1. Business problem
          2. Machine Learning mapping
          3. Data collection
          4. Data sampling and transformation
            1. Experiments, results, and analysis
              1. Feature relevance and analysis
              2. Evaluation on test data
              3. Analysis of results
          5. Spark MLlib as Big Data Machine Learning platform
            1. Spark architecture
            2. Machine Learning in MLlib
            3. Tools and usage
            4. Experiments, results, and analysis
              1. k-Means
              2. k-Means with PCA
              3. Bisecting k-Means (with PCA)
              4. Gaussian Mixture Model
              5. Random Forest
                1. Analysis of results
            5. Real-time Big Data Machine Learning
              1. SAMOA as a real-time Big Data Machine Learning framework
                1. SAMOA architecture
              2. Machine Learning algorithms
              3. Tools and usage
              4. Experiments, results, and analysis
                1. Analysis of results
            6. The future of Machine Learning
            7. Summary
            8. References
      10. A. Linear Algebra
        1. Vector
          1. Scalar product of vectors
        2. Matrix
          1. Transpose of a matrix
            1. Matrix addition
            2. Scalar multiplication
            3. Matrix multiplication
              1. Properties of matrix product
                1. Linear transformation
                2. Matrix inverse
                3. Eigendecomposition
                4. Positive definite matrix
            4. Singular value decomposition (SVD)
      11. B. Probability
        1. Axioms of probability
        2. Bayes' theorem
          1. Density estimation
          2. Mean
          3. Variance
          4. Standard deviation
          5. Gaussian standard deviation
          6. Covariance
          7. Correlation coefficient
          8. Binomial distribution
          9. Poisson distribution
          10. Gaussian distribution
          11. Central limit theorem
          12. Error propagation
      12. D. Bibliography
    8. Index