- We start, as usual, with the necessary imports:
import gzipimport pickleimport randomimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdfrom pandas.plotting import scatter_matrix%matplotlib inline
- Then we load the data. We will use pandas to navigate it:
fit = np.load(gzip.open('balanced_fit.npy.gz', 'rb'))ordered_features = np.load(open('ordered_features', 'rb'))num_features = len(ordered_features)fit_df = pd.DataFrame(fit, columns=ordered_features + ['pos', 'error'])num_samples = 80del fit
- Let's ask pandas to show an histogram of all annotations:
fig,ax = plt.subplots(figsize=(16,9))fit_df.hist(column=ordered_features, ax=ax)
The following histogram is generated: