In data analysis, a picture is often better than a thousand words. When we start analyzing data, the first step is not to run a complex statistical tool, but should be to visualize the data in a graph. This will allow us to understand the basic nature of the data. To meet this requirement, the following techniques are generally considered: *scatter plot, histogram, boxplot* and *Q-Q plot*.

The scatter plot is a plot in which Cartesian coordinates are used to display values of pairs of variables. A scatter plot can suggest various kinds of correlations and trends between two variables. Typically, the scatter plot is commonly used to determine whether the relationship between variables is linear or not. The following program displays the scatter plot of 6 variables of the exogenous variables of the dataset `statsmodels.api.datasets.star98`

:

`# -*- coding: utf-8 -*-`

`"""`

`Created on Sun Jul 3 07:53:57 2016`

`****** scatterplotexample`

`@author: maurice`

`"""`

`import statsmodels.api as sm`

`from matplotlib import pyplot as plt`

`data = sm.datasets.star98.load()`

`P = 6; C=int((P-1)*P/2);`

`DP = 13; X0 = data.exog[:,DP:DP+P]`

`plt.figure(1); plt.clf(); cp=0;`

`for i1 in range(P):`

`for i2 in range(i1+1,P):`

`cp=cp+1;`

`plt.subplot(4,4,cp);`

`plt.plot(X0[:,i1],X0[:,i2],’.’)`

`plt.xticks([]); plt.yticks([])`

`plt.title(’(%i,%i)’%(i1,i2), fontsize=10)`

`plt.subplot(4,4,16)`

`for ip in range(P):`

`nametext=(’%i: %s’%(ip,data.names[DP+ip])) ...`

Start Free Trial

No credit card required