How to do it…

In the following steps, you will visualize the HIPAA breaches dataset in pandas and use TF-IDF to extract important keywords from the descriptions of the breaches. Let's get started:

  1.  Load and clean the HIPAA breaches dataset using pandas:

import pandas as pd  df = pd.read_csv("HIPAA-breach-report-2009-to-2017.csv") df = df.dropna()

The output of the preceding code is shown in the following screenshot:

  1. Plot a histogram of the number of individuals who have been affected by a breach against the frequency of such breaches by using the following code:

%matplotlib inline def_fig_size = (15, 6) df["Individuals Affected"].plot( ...

Get Machine Learning for Cybersecurity Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.