O'Reilly logo
live online training icon Live Online training

Fraud analytics using Python

enter image description here

Design and develop machine learning models to catch fraudsters

Harshit Tyagi

A typical financial organization loses 5% of its yearly revenue to fraud. A recent survey of 2,000 firms in the UK found that 923,000 cases of suspected money laundering were reported and more than 1.15 million prospective customers were refused services. At a time when financial and ecommerce companies are growing at an unprecedented pace, every organization is facing challenges in tackling fraud. However, risk analysts, financial crime analysts, and data scientists are working on implementing models to prevent these crimes.

Join expert Harshit Tyagi for a hands-on dive into implementing machine learning models to analyze risk, using Python. You’ll put your programming and machine learning skills to work to analyze transactional data to detect fraud. Along the way, you'll get an introduction to the financial crime domain and learn resampling techniques to prepare unbalanced data for modeling. You'll also discover how to flag fraudsters with supervised learning, using labeled data.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to prepare and deal with highly unbalanced data for training
  • How to use supervised learning and unsupervised learning techniques for risk analysis

And you’ll be able to:

  • Design machine learning models to detect fraudulent transactions
  • Use textual processing to do sentiment analysis

This training course is for you because...

  • You're a programmer or data scientist with a little understanding of finance or ecommerce, and you want to pursue a career in finance, risk, or compliance.
  • You're a data analyst or someone with strong fundamentals in programming and statistics, and you want to work for a fintech organization.


  • Experience programming with Python using pandas and Matplotlib
  • Familiarity with basic statistical concepts, linear algebra, calculus, and supervised and unsupervised learning

Recommended preparation:

About your instructor

  • Harshit Tyagi is a Full Stack Developer and Data Engineer at Elucidata, a Cambridge based Biotech company. He develops algorithms for research scientists at the world’s best medical schools like Yale, UCLA, and MIT. Before Elucidata, he was working as a Systems Development Engineer at an Investment Management firm called Tradelogic where he designed a framework to analyze financial news from all prominent sources to produce accurate trading signals. He is a Python evangelist and loves to contribute to tech communities like Google Developers Groups, Python Delhi User Groups, and other E-learning platforms. With the skills acquired over years and being a mentor and reviewer for more than 3 years in the E-learning era, it’d be great to share the enterprise-grade practices to produce more skillful data scientists and quantitative traders.


The timeframes are only estimates and may vary according to how the class is progressing

Data preparation techniques (50 minutes)

  • Lecture: Fraud detection scale and complexity; how to process unbalanced data for analysis using SMOTE and logistic regression
  • Group discussion: What are the different types of fraud happening in the financial sector? How can Python help solve this problem? Why use Python for fraud detection?
  • Hands-on exercise: Set up and validate your environment; read and plot data, correcting the format of data extracted
  • Q&A
  • Break (10 minutes)

Supervised learning (50 minutes)

  • Lecture: Flagging fraudsters with supervised learning (random forest classifier and logistic regression), using labeled data; performance evaluation
  • Hands-on demonstration and exercise: Code the random forest classifier and check different classification methods; code the algorithms and run them over practice datasets
  • Break (10 minutes)

Regression techniques (50 minutes)

  • Lecture: Using GridsearchCV to find optimal parameters; logistics regression and voting classifier
  • Hands-on demonstration and exercise: Use scikit-learn to test the algorithms and optimize them; code the regression methods in Jupyter notebooks

Wrap-up and Q&A (10 minutes)