O'Reilly logo
live online training icon Live Online training

Fraud Analytics using Python

enter image description here

Design and develop Machine learning models to catch fraudsters

Harshit Tyagi

This is a 1-day live online training session in which you’ll learn to use your programming and machine learning skills to analyze transactional data to detect fraud. This is a hands-on course which covers the basics of Machine learning and its implementation to analyze risk using python.

We’ll be covering the introduction to the financial crime domain and perform exploratory analysis on transactional data and types of frauds. We’ll learn resampling techniques to prepare the unbalanced data for modeling. The final section will cover how to flag fraudsters with supervised learning (labeled data).

What you'll learn-and how you can apply it

  • You are a programmer or data scientist with a little understanding of finance/e-commerce who wants to pursue a career in Finance/Risk/Compliance. You should have basic knowledge of supervised and unsupervised learning.
  • You are a Data Analyst or someone with strong fundamentals in programming and statistics. You are planning to get into a FinTech organization.

This training course is for you because...

By the end of this live, hands-on, online course, you’ll understand:

  • How to prepare and deal with highly unbalanced data for training.
  • Supervised learning and unsupervised learning in Risk Analysis

And you’ll be able to:

  • Design ML models to detect fraudulent transactions which many organizations are looking for.
  • Use textual processing to do sentiment analysis.


About your instructor

  • Harshit Tyagi is a Full Stack Developer and Data Engineer at Elucidata, a Cambridge based Biotech company. He develops algorithms for research scientists at the world’s best medical schools like Yale, UCLA, and MIT. Before Elucidata, he was working as a Systems Development Engineer at an Investment Management firm called Tradelogic where he designed a framework to analyze financial news from all prominent sources to produce accurate trading signals. He is a Python evangelist and loves to contribute to tech communities like Google Developers Groups, Python Delhi User Groups, and other E-learning platforms. With the skills acquired over years and being a mentor and reviewer for more than 3 years in the E-learning era, it’d be great to share the enterprise-grade practices to produce more skillful data scientists and quantitative traders.


The timeframes are only estimates and may vary according to how the class is progressing

Data Preparation techniques(50 mins)

  • Presentation (15min): Fraud detection scale and complexity- A brief overview of what we are going to cover in the course.
  • Discussion (5 mins): Questions (What are the different types of frauds happening in the financial sector? How can we use Python to solve this? Why Python for fraud detection?)
  • Presentation (10 mins): How to process unbalanced data for analysis using SMOTE and logistic regression?
  • 5m Presentation: Setup - A quick validation of your environment, so you can follow along.
  • Exercise (15mins): reading and plotting data, correcting the format of data extracted.
  • Checkpoint: Poll to find out the ratio of fraud to non-fraud cases.
  • Q&A (5 mins)
  • Break (5 mins)

Supervised Learning(50 mins)

  • Presentation(20 mins): Flagging fraudsters using labeled data - Use supervised learning algorithms (Random forest classifier, logistic regression).
  • Walkthrough(20 mins): Code the random forest classifier and check different classification methods.
  • Presentation (5 mins): Performance evaluation.
  • Exercise(15 mins): Try coding the taught algorithms and run it over practice datasets.
  • Break (5 mins)

Regression techniques (60mins)

  • Presentation (15 mins): Using GridsearchCV to find optimal parameters.
  • Walkthrough (5 mins): Using Sklearn to test the algorithms and optimizing them
  • Presentation(15 mins): Logistics regression and voting classifier.
  • Exercise(15 mins): Practice coding the regression methods in Jupyter notebooks.
  • Q&A (10 mins)