O'Reilly logo
live online training icon Live Online training

Advanced Machine Learning: Create Real-World ML Projects

Develop Your Own Face Recognizer, Text Analyzer, Network Security Checker and Stock Market Predictor

Topic: Data
Noureddin Sadawi

Machine learning (ML) is a fast growing field and its applications are becoming ubiquitous. One real life application of ML is the automatic face detection and recognition. Face detection and recognition can be used (and is currently used) at airports, train stations, shopping centers and other places to automatically identify known criminals, thieves or dangerous individuals.

Another real world application is the analysis of text. ML can be used to analyze, gain insight into and extract patterns from short text (e.g. tweets) as well as long text (e.g. research or news articles available on the web or on paper) in order for example to check if an email is spam or not. In addition, ML can be used in computer and cyber security. The number of botnets is ever increasing, hence, the ability to analyze computer network traffic and use ML to identify malicious traffic is a big plus when it comes to computer network and cybersecurity.

Furthermore, being able to automatically predict how a time-series (such as average sales over a period of time, spread of a disease or stock market data) are all skills that come to play in this arena, and are highly in demand in the current job market. Deep learning is a suitable technology to use in this context.

In this course you will learn how to build accurate systems that handle tasks, such as these. The course will involve using deep learning (using tensorflow and keras) as well as traditional machine learning algorithms such as RandomForest and NaiveBayes in Python. The entire process from preparing and preprocessing data (e.g. scaling, missing value imputation etc) to building models ready for real life usage will be covered.

Python as a programming language that is in high demand in today’s job market. To be able to use Python for computer vision and security, for text or time-series data analysis adds a valued component to your skillset. In particular, being able to automate these processes is a strong plus for your CV.

This course shows how to use powerful deep/machine learning methods to locate, track and recognize human faces, analyze text, computer network traffic and time-series data (time-series data is everywhere and you can be surprised by learning about its applications).

The course will illustrate how to use various techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs) and long short-term memory neural networks (LSTMs). It will also involve using several traditional machine learning algorithms such as RandomForest and NaiveBayes as well as several key data preparation and pre-processing approaches.

What you'll learn-and how you can apply it

  • Use Python to build a human face detection and recognition system (using deep learning; namely using convolutional neural networks, or CNNs for short)
  • Use Python to build a system that analyses computer network traffic and identifies malicious traffic from normal, or safe, traffic
  • Use Python to build a fully working textual data analysis system to classify documents or tweets (this will involve connecting to twitter and downloading tweets programmatically)
  • Use Python to build a deep learning based time-series prediction system. This can be used for many applications where data is time dependent. This training will involve programmatically downloading stock market data and building a deep neural network (namely a recurrent neural network, or RNN, and/or a long short-term memory neural network, or LSTM)
  • All of the above will involve several key data preparation and pre-processing tasks such as data clean-up, scaling, normalisation, missing data imputation and more
  • Professionally written Python code will be shared with the you so that you can use it for your own projects and purposes

This training course is for you because...

  • You are familiar with Python and machine/deep learning and want to use them to build many fully functioning real life applications
  • You would like to learn how real-time face detection and recognition works and want to use Python and CNNs to build a system that does exactly that
  • You would like to learn how automatic text processing and analysis works and want to use Python to build a system that does exactly that (this is a great start to the field of natural language processing or NLP)
  • You would like to learn how to use Python to perform analysis of computer network traffic for computer and cyber security purposes. Here you will learn how to build a system that automatically distinguishes between safe and malicious network traffic!
  • You would like to learn how deep learning systems that predict future stock market prices work and how to use Python and RNNs/LSTMs to build a system that does exactly that
  • You would like to learn how to prepare data for modelling by applying many key data preparation and pre-processing techniques such as data clean-up, scaling, normalisation, missing data imputation and more.
  • In general, this course will empower you and strengthen your Python skills in the above fields
  • Code snippets and ideas in this course are not limited to these fields only but they can be easily adapted in other fields
  • Skills gained in this course can help you establish a business or build a commercial tool (imagine how many real life applications any of the systems developed in this course can have).

Prerequisites

  • Familiarity with Python
  • Familiarity with basic machine learning in Python
  • Familiarity with basic computer vision in Python
  • Familiarity with Convolutional Neural Networks (in Tensorflow and/or Keras)
  • Familiarity with recurrent neural networks (RNNs) and long short-term memory neural networks (LSTMs) in Tensorflow and/or Keras

Course Set-up

  • Any operating system is fine
  • Python 3.5 or above (Anaconda distribution)
  • Speedy internet connection

Recommended Preparation

Recommended Follow-up

About your instructor

  • Dr. Noureddin Sadawi is a consultant in machine learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham, United Kingdom. During his PhD he developed a technique to extract precise information from bitmap images of chemical structure diagrams. He developed a tool called MolRec and used it to participate in evaluation contests at two international events - TREC2011 and CLEF2012 - and won both of them.

    Noureddin is an avid scientific software researcher and developer who has a passion for learning and teaching new technologies. He has been involved in several projects spanning a variety of fields such as bioinformatics, drug discovery, omics data analysis and much more. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. One of his latest positions was a research associate at the highly respected Imperial College London where he contributed significantly to the PhenoMeNal project (a project that heavily uses docker). Currently, he is a research fellow at the department of computer science, Brunel University – London where he developed deep learning techniques for the analysis of human gesture data.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day 1

Part 1: Human Face Detection and Recognition (60 minutes)

  • A short overview of CNNs and how they work
  • How to connect to webcam using Python
  • Real-time human face detection with Python and OpenCV (using deep learning)
  • Identifying the rectangle around faces in a video or an image
  • Detailed explanation of how to build a deep CNN based system to perform face recognition
  • Detailed explanation of how to evaluate the performance of this system
  • Python code that for real-time human eye detection (time permitting)
  • Real-time human mouth detection (time permitting)
  • Real-time human hose detection (time permitting)
  • Live demonstration of the above techniques

Q&A (10 minutes)

Break (10 minutes)

Part 2: Textual Data Analysis - natural language processing (NLP) (60 minutes)

  • Introduction to text analysis
  • Learn how to prepare text documents
  • Learn how to use Python to connect to twitter and retrieve tweets
  • Detailed explanation of several text pre-processing techniques such as stop-word removal, stemming, tokenization, TF-IDF and more (with Python code along)
  • Learn how to build a data-matrix ready for modelling (with a focus on how to deal with sparse data)
  • Applying principal component analysis (PCA) to explore and visualize the data
  • Detailed explanation of how to build a text (or tweet) classification system using your favourite method (i.e. deep learning, RandomForest, NaiveBayes or others)
  • Detailed explanation of how to evaluate the performance of this system
  • Live demonstration of the above techniques

Q&A (10 minutes)

Day 2

Part 1: Computer Network Traffic Analysis for Cyber Security (60 minutes)

  • Introduction to cyber security and different attack types
  • A quick overview of Botnets
  • How to obtain network traffic data (live data capture or downloading existing datasets)
  • Pre-processing captured data (includes detailed explanation of several essential steps to prepare data for modelling)
  • Applying principal component analysis (PCA) to explore and visualize the data
  • Detailed explanation of how to build a network traffic data classification system using your favourite method (i.e. deep learning, RandomForest, NaiveBayes or any other classifier)
  • Detailed explanation of how to evaluate the performance of this system
  • Live demonstration of the above techniques

Q&A (10 minutes)

Break (10 minutes)

Part 2: Time-series Data Analysis (total 60 minutes)

  • Introduction to Time-series data and why it is unique
  • Overview and Examples of Everyday Time-series data (e.g. Power Consumption, Stock Market Data, Rainfall etc)
  • Learn how to use Python to obtain time-series data (e.g. stock market data)
  • Apply data exploration and visualisation
  • Learn how to segment this type of data (i.e. time series data) so it can be used as input to a deep learning predictive model (a deep RNN or LSTM)
  • Detailed explanation of how to build a deep RNN or LSTM based system that learns from past data and predicts future data (e.g. stock market data predictions)
  • Detailed explanation of how to evaluate the performance of this system

Q&A (10 minutes)