Learn about unbalanced sequential data in machine learning. Topics to be covered include:
- About sequential data. We define sequential data with the help of two examples and then describe the types of learning which can be performed on sequential data using machine learning algorithms.
- Sequential vs. IID data. We cover the difference between standard IID data and sequential data. We also describe how to mathematically formulate these two data types. For IID data there is only one possible mathematical formulation and for sequential data there are two mathematical formulations.
- Modeling approaches. We explore the modeling approaches for unbalanced sequential data. We also discuss the limitations of classical machine learning algorithms for modeling sequential data. We introduce two main neural nets (RNNs and LSTMs) and explain why these two models are a good fit for sequential data. We conclude with an explanation of gradient boosting and its advantages over other existing methods.
- Recurrent Neural Networks.We describe the key components of our proposed approach for solving unbalanced sequential data. We describe RNN (Recurrent Neural Networks) and the reasons for having shared weights in RNN, followed by the training of RNNs using BPTT. We conclude with RNN use cases including machine translation, image captioning, video classification, speech recognition, time series prediction, and text generation.
- Long Short Term Memory. We discuss LSTM (Long Short Term Memory), its structure, and the gradient boosting ensemble technique. We cover the learning process of gradient boosting in detail including an algorithm for binary classification.
- Gradient boosting. We explain Gradient Boosting for regression problems and how to apply gradient boosting for the binary classification problem.
- Keras and data sets. Learn the experimental settings which include Keras and its most commonly used layers including Dense and LSTM. We also cover the nature of datasets and their distribution. We conclude with a description of the network architecture along with the benefits of using the Embedding Layer instead of using OHE (One hot encoding technique).
- Performance metrics.Master practical tips for training neural nets, Main Plots (AUC) curves of each dataset, Train-Test loss curves of boosting, and other performance metrics such as precision, recall, and fscore on an actual data set. The Sigmoid Activation Function and Hyperbolic Tangent Activation Function and Relu Activation Function are also discussed.
- Title: Unbalanced Sequential Data in Machine Learning
- Release date: September 2017
- Publisher(s): Technics Publications
- ISBN: 9781634622837