In the following steps, you will visualize the HIPAA breaches dataset in pandas and use TF-IDF to extract important keywords from the descriptions of the breaches. Let's get started:
-
Load and clean the HIPAA breaches dataset using pandas:
import pandas as pd df = pd.read_csv("HIPAA-breach-report-2009-to-2017.csv") df = df.dropna()
The output of the preceding code is shown in the following screenshot:
-
Plot a histogram of the number of individuals who have been affected by a breach against the frequency of such breaches by using the following code:
%matplotlib inline def_fig_size = (15, 6) df["Individuals Affected"].plot( ...