2Statistical Inferences
2.1. First step: visualizing data
In data analysis, a picture is often better than a thousand words. When we start analyzing data, the first step is not to run a complex statistical tool, but should be to visualize the data in a graph. This will allow us to understand the basic nature of the data. To meet this requirement, the following techniques are generally considered: scatter plot, histogram, boxplot and Q-Q plot.
2.1.1. Scatter plot
The scatter plot is a plot in which Cartesian coordinates are used to display values of pairs of variables. A scatter plot can suggest various kinds of correlations and trends between two variables. Typically, the scatter plot is commonly used to determine whether the relationship between variables is linear or not. The following program displays the scatter plot of 6 variables of the exogenous variables of the dataset statsmodels.api.datasets.star98
:
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 3 07:53:57 2016
****** scatterplotexample
@author: maurice
"""
import statsmodels.api as sm
from matplotlib import pyplot as plt
data = sm.datasets.star98.load()
P = 6; C=int((P-1)*P/2);
DP = 13; X0 = data.exog[:,DP:DP+P]
plt.figure(1); plt.clf(); cp=0;
for i1 in range(P):
for i2 in range(i1+1,P):
cp=cp+1;
plt.subplot(4,4,cp);
plt.plot(X0[:,i1],X0[:,i2],’.’)
plt.xticks([]); plt.yticks([])
plt.title(’(%i,%i)’%(i1,i2), fontsize=10)
plt.subplot(4,4,16)
for ip in range(P):
nametext=(’%i: %s’%(ip,data.names[DP+ip])) ...
Get Digital Signal Processing (DSP) with Python Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.