How to do it…

In the following steps, you will visualize the HIPAA breaches dataset in pandas and use TF-IDF to extract important keywords from the descriptions of the breaches. Let's get started:

  1.  Load and clean the HIPAA breaches dataset using pandas:

import pandas as pd  df = pd.read_csv("HIPAA-breach-report-2009-to-2017.csv") df = df.dropna()

The output of the preceding code is shown in the following screenshot:

  1. Plot a histogram of the number of individuals who have been affected by a breach against the frequency of such breaches by using the following code:

%matplotlib inline def_fig_size = (15, 6) df["Individuals Affected"].plot( ...

Get Machine Learning for Cybersecurity Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.