Hands-On Machine Learning for Cybersecurity

Book description

Get into the world of smart data security using machine learning algorithms and Python libraries

Key Features

  • Learn machine learning algorithms and cybersecurity fundamentals
  • Automate your daily workflow by applying use cases to many facets of security
  • Implement smart machine learning solutions to detect various cybersecurity problems

Book Description

Cyber threats today are one of the costliest losses that an organization can face. In this book, we use the most efficient tool to solve the big problems that exist in the cybersecurity domain.

The book begins by giving you the basics of ML in cybersecurity using Python and its libraries. You will explore various ML domains (such as time series analysis and ensemble modeling) to get your foundations right. You will implement various examples such as building system to identify malicious URLs, and building a program to detect fraudulent emails and spam. Later, you will learn how to make effective use of K-means algorithm to develop a solution to detect and alert you to any malicious activity in the network. Also learn how to implement biometrics and fingerprint to validate whether the user is a legitimate user or not.

Finally, you will see how we change the game with TensorFlow and learn how deep learning is effective for creating models and training systems

What you will learn

  • Use machine learning algorithms with complex datasets to implement cybersecurity concepts
  • Implement machine learning algorithms such as clustering, k-means, and Naive Bayes to solve real-world problems
  • Learn to speed up a system using Python libraries with NumPy, Scikit-learn, and CUDA
  • Understand how to combat malware, detect spam, and fight financial fraud to mitigate cyber crimes
  • Use TensorFlow in the cybersecurity domain and implement real-world examples
  • Learn how machine learning and Python can be used in complex cyber issues

Who this book is for

This book is for the data scientists, machine learning developers, security researchers, and anyone keen to apply machine learning to up-skill computer security. Having some working knowledge of Python and being familiar with the basics of machine learning and cybersecurity fundamentals will help to get the most out of the book

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Machine Learning for Cybersecurity
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Basics of Machine Learning in Cybersecurity
    1. What is machine learning?
      1. Problems that machine learning solves
      2. Why use machine learning in cybersecurity?
      3. Current cybersecurity solutions
      4. Data in machine learning
        1. Structured versus unstructured data
        2. Labelled versus unlabelled data
        3. Machine learning phases
        4. Inconsistencies in data
          1. Overfitting
          2. Underfitting
      5. Different types of machine learning algorithm
        1. Supervised learning algorithms
        2. Unsupervised learning algorithms
        3. Reinforcement learning
        4. Another categorization of machine learning
        5. Classification problems
        6. Clustering problems
        7. Regression problems
        8. Dimensionality reduction problems
        9. Density estimation problems
        10. Deep learning
      6. Algorithms in machine learning
        1. Support vector machines
        2. Bayesian networks
        3. Decision trees
        4. Random forests
        5. Hierarchical algorithms
        6. Genetic algorithms
        7. Similarity algorithms
        8. ANNs
      7. The machine learning architecture
        1. Data ingestion
        2. Data store
        3. The model engine
          1. Data preparation
          2. Feature generation
          3. Training
          4. Testing
        4. Performance tuning
          1. Mean squared error
          2. Mean absolute error
          3. Precision, recall, and accuracy
        5. How can model performance be improved?
          1. Fetching the data to improve performance
          2. Switching machine learning algorithms
          3. Ensemble learning to improve performance
      8. Hands-on machine learning
        1. Python for machine learning
        2. Comparing Python 2.x with 3.x
        3. Python installation
        4. Python interactive development environment
          1. Jupyter Notebook installation
        5. Python packages
          1. NumPy
          2. SciPy
          3. Scikit-learn
          4. pandas
          5. Matplotlib
        6. Mongodb with Python
          1. Installing MongoDB
          2. PyMongo
        7. Setting up the development and testing environment
          1. Use case
          2. Data
          3. Code
    2. Summary
  7. Time Series Analysis and Ensemble Modeling
    1. What is a time series?
      1. Time series analysis
        1. Stationarity of a time series models
        2. Strictly stationary process
        3. Correlation in time series
          1. Autocorrelation
          2. Partial autocorrelation function
    2. Classes of time series models
      1. Stochastic time series model
      2. Artificial neural network time series model
      3. Support vector time series models
      4. Time series components
        1. Systematic models
        2. Non-systematic models
    3. Time series decomposition
      1. Level
      2. Trend
      3. Seasonality
      4. Noise
    4. Use cases for time series
      1. Signal processing
      2. Stock market predictions
      3. Weather forecasting
      4. Reconnaissance detection
    5. Time series analysis in cybersecurity
    6. Time series trends and seasonal spikes
      1. Detecting distributed denial of series with time series
      2. Dealing with the time element in time series
      3. Tackling the use case
      4. Importing packages
        1. Importing data in pandas
        2. Data cleansing and transformation
      5. Feature computation
    7. Predicting DDoS attacks
      1. ARMA
      2. ARIMA
      3. ARFIMA
    8. Ensemble learning methods
      1. Types of ensembling
        1. Averaging
        2. Majority vote
        3. Weighted average
      2. Types of ensemble algorithm
        1. Bagging
        2. Boosting
        3. Stacking
        4. Bayesian parameter averaging
        5. Bayesian model combination
        6. Bucket of models
      3. Cybersecurity with ensemble techniques
    9. Voting ensemble method to detect cyber attacks
    10. Summary
  8. Segregating Legitimate and Lousy URLs
    1. Introduction to the types of abnormalities in URLs
      1. URL blacklisting
        1. Drive-by download URLs
        2. Command and control URLs
        3. Phishing URLs
    2. Using heuristics to detect malicious pages
      1. Data for the analysis
      2. Feature extraction
        1. Lexical features
      3. Web-content-based features
      4. Host-based features
      5. Site-popularity features
    3. Using machine learning to detect malicious URLs
    4. Logistic regression to detect malicious URLs
      1. Dataset
      2. Model
        1. TF-IDF
    5. SVM to detect malicious URLs
    6. Multiclass classification for URL classification
      1. One-versus-rest
    7. Summary
  9. Knocking Down CAPTCHAs
    1. Characteristics of CAPTCHA
    2. Using artificial intelligence to crack CAPTCHA
      1. Types of CAPTCHA
      2. reCAPTCHA
        1. No CAPTCHA reCAPTCHA
      3. Breaking a CAPTCHA
      4. Solving CAPTCHAs with a neural network
        1. Dataset
        2. Packages
        3. Theory of CNN
        4. Model
      5. Code
        1. Training the model
        2. Testing the model
    3. Summary
  10. Using Data Science to Catch Email Fraud and Spam
    1. Email spoofing
      1. Bogus offers
      2. Requests for help
      3. Types of spam emails
        1. Deceptive emails
        2. CEO fraud
        3. Pharming
        4. Dropbox phishing
        5. Google Docs phishing
    2. Spam detection
      1. Types of mail servers
      2. Data collection from mail servers
      3. Using the Naive Bayes theorem to detect spam
      4. Laplace smoothing
      5. Featurization techniques that convert text-based emails into numeric values
        1. Log-space
        2. TF-IDF
        3. N-grams
        4. Tokenization
      6. Logistic regression spam filters
        1. Logistic regression
        2. Dataset
        3. Python
        4. Results
    3. Summary
  11. Efficient Network Anomaly Detection Using k-means
    1. Stages of a network attack
      1. Phase 1 – Reconnaissance
      2. Phase 2 – Initial compromise
      3. Phase 3 – Command and control
      4. Phase 4 – Lateral movement
      5. Phase 5 – Target attainment
      6. Phase 6 – Ex-filtration, corruption, and disruption
    2. Dealing with lateral movement in networks
    3. Using Windows event logs to detect network anomalies
      1. Logon/Logoff events
      2. Account logon events
      3. Object access events
      4. Account management events
        1. Active directory events
    4. Ingesting active directory data
    5. Data parsing
    6. Modeling
    7. Detecting anomalies in a network with k-means
      1. Network intrusion data
        1. Coding the network intrusion attack
        2. Model evaluation
          1. Sum of squared errors
        3. Choosing k for k-means
        4. Normalizing features
        5. Manual verification
    8. Summary
  12. Decision Tree and Context-Based Malicious Event Detection
    1. Adware
    2. Bots
    3. Bugs
    4. Ransomware
    5. Rootkit
    6. Spyware
    7. Trojan horses
    8. Viruses
    9. Worms
    10. Malicious data injection within databases
    11. Malicious injections in wireless sensors
    12. Use case
      1. The dataset
      2. Importing packages
      3. Features of the data
      4. Model
        1. Decision tree
        2. Types of decision trees
          1. Categorical variable decision tree
          2. Continuous variable decision tree
        3. Gini coeffiecient
        4. Random forest
        5. Anomaly detection
          1. Isolation forest
          2. Supervised and outlier detection with Knowledge Discovery Databases (KDD)
    13. Revisiting malicious URL detection with decision trees
    14. Summary
  13. Catching Impersonators and Hackers Red Handed
    1. Understanding impersonation
    2. Different types of impersonation fraud
      1. Impersonators gathering information
      2. How an impersonation attack is constructed
        1. Using data science to detect domains that are impersonations
    3. Levenshtein distance
      1. Finding domain similarity between malicious URLs
      2. Authorship attribution
        1. AA detection for tweets
      3. Difference between test and validation datasets
        1. Sklearn pipeline
      4. Naive Bayes classifier for multinomial models
      5. Identifying impersonation as a means of intrusion detection
    4. Summary
  14. Changing the Game with TensorFlow
    1. Introduction to TensorFlow
    2. Installation of TensorFlow
    3. TensorFlow for Windows users
    4. Hello world in TensorFlow
    5. Importing the MNIST dataset
    6. Computation graphs
      1. What is a computation graph?
    7. Tensor processing unit
    8. Using TensorFlow for intrusion detection
    9. Summary
  15. Financial Fraud and How Deep Learning Can Mitigate It
    1. Machine learning to detect financial fraud
      1. Imbalanced data
      2. Handling imbalanced datasets
        1. Random under-sampling
        2. Random oversampling
        3. Cluster-based oversampling
        4. Synthetic minority oversampling technique
        5. Modified synthetic minority oversampling technique
      3. Detecting credit card fraud
        1. Logistic regression
        2. Loading the dataset
        3. Approach
    2. Logistic regression classifier – under-sampled data
      1. Tuning hyperparameters
        1. Detailed classification reports
        2. Predictions on test sets and plotting a confusion matrix
      2. Logistic regression classifier – skewed data
      3. Investigating precision-recall curve and area
    3. Deep learning time
      1. Adam gradient optimizer
    4. Summary
  16. Case Studies
    1. Introduction to our password dataset
      1. Text feature extraction
      2. Feature extraction with scikit-learn
      3. Using the cosine similarity to quantify bad passwords
      4. Putting it all together
    2. Summary
  17. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Machine Learning for Cybersecurity
  • Author(s): Soma Halder, Sinan Ozdemir
  • Release date: December 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788992282