## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# 2Statistical Inferences

## 2.1. First step: visualizing data

In data analysis, a picture is often better than a thousand words. When we start analyzing data, the first step is not to run a complex statistical tool, but should be to visualize the data in a graph. This will allow us to understand the basic nature of the data. To meet this requirement, the following techniques are generally considered: scatter plot, histogram, boxplot and Q-Q plot.

### 2.1.1. Scatter plot

The scatter plot is a plot in which Cartesian coordinates are used to display values of pairs of variables. A scatter plot can suggest various kinds of correlations and trends between two variables. Typically, the scatter plot is commonly used to determine whether the relationship between variables is linear or not. The following program displays the scatter plot of 6 variables of the exogenous variables of the dataset `statsmodels.api.datasets.star98`:

````# -*- coding: utf-8 -*-`
`"""`
`Created on Sun Jul 3 07:53:57 2016`
`****** scatterplotexample`
`@author: maurice`
`"""`
`import statsmodels.api as sm`
`from matplotlib import pyplot as plt`
`data = sm.datasets.star98.load()`
`P = 6; C=int((P-1)*P/2);`
`DP = 13; X0 = data.exog[:,DP:DP+P]`
`plt.figure(1); plt.clf(); cp=0;`
`for i1 in range(P):`
` for i2 in range(i1+1,P):`
`  cp=cp+1;`
`  plt.subplot(4,4,cp);`
`  plt.plot(X0[:,i1],X0[:,i2],’.’)`
`  plt.xticks([]); plt.yticks([])`
`  plt.title(’(%i,%i)’%(i1,i2), fontsize=10)`
`plt.subplot(4,4,16)`
`for ip in range(P):`
` nametext=(’%i: %s’%(ip,data.names[DP+ip])) ...````

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required