In data analysis, a picture is often better than a thousand words. When we start analyzing data, the first step is not to run a complex statistical tool, but should be to visualize the data in a graph. This will allow us to understand the basic nature of the data. To meet this requirement, the following techniques are generally considered: scatter plot, histogram, boxplot and Q-Q plot.
The scatter plot is a plot in which Cartesian coordinates are used to display values of pairs of variables. A scatter plot can suggest various kinds of correlations and trends between two variables. Typically, the scatter plot is commonly used to determine whether the relationship between variables is linear or not. The following program displays the scatter plot of 6 variables of the exogenous variables of the dataset
# -*- coding: utf-8 -*-
Created on Sun Jul 3 07:53:57 2016
import statsmodels.api as sm
from matplotlib import pyplot as plt
data = sm.datasets.star98.load()
P = 6; C=int((P-1)*P/2);
DP = 13; X0 = data.exog[:,DP:DP+P]
plt.figure(1); plt.clf(); cp=0;
for i1 in range(P):
for i2 in range(i1+1,P):
for ip in range(P):
nametext=(’%i: %s’%(ip,data.names[DP+ip])) ...