429
13
Spatial Auto-Correlation
and Auto-Regression
13.1 LATTICE DATA: SPATIAL AUTO-CORRELATION
ANDAUTO-REGRESSION
Lattice spatial data are such that the spatial domain is divided into regions and the observations or
variable values are associated with regions. There are two types of lattice data: regular, e.g., grid or
raster, and irregular, e.g., polygons. Variables have a unique value for an entire region. The regions
have a neighborhood structure given by distances between centroids or by the amount of shared
borders.
One important analysis method of lattice data is spatial auto-correlation. Its objective is to
detect spatial patterns based on correlation of a variable among regions, given the neighborhood
structure. This information is useful to understand spatial patterning and to make decisions regard-
ing the applicability of correlation and regression methods among variables. Another important
method is spatial auto-regression (SAR). Its objective is to predict the outcome or value of a
variable in a region based partially on the values of the same variable in neighboring regions and
partially on other variables.
13.2 SPATIAL STRUCTURE AND VARIANCE INFLATION
An important reason for performing auto-correlation is to determine whether the assumptions of
lack of serial correlation to perform regression are appropriate. You should recall now two important
aspects of regression. First, for simple regression: it assumes that values of the independent variable
are independent observations, i.e., they are uncorrelated. This is why we checked forauto-correlation
in time when doing exploratory data analysis. Second, for multiple regression: we demonstrated that
the various independent variables should not be correlated or collinear because this would leadto
distorted values of the regression coefcients, giving more importance to some variables and causing
variance ination.
Correlation among values of the independent variable can occur because they have spatial depen-
dence. Therefore, we need to make sure that the spatial structure does not affect the estimation of
the coefcients. We investigate the potential for this problem using spatial auto-correlation, and
the effect of spatial structure is included using spatial auto-regression.
13.3 NEIGHBORHOOD STRUCTURE
Neighborhood structure provides the covariance structure needed for spatial auto-correlation
and auto-regression. There are several ways of defining neighbor regions: one is by the amount
of common borders, and the other is by the distance separating a reference point of each
region. For example, Figure 13.1 illustrates nine regions. The label identifies the region. The
neighborhood structure is not necessarily symmetric, because it depends on how we define
neighbors.
430 Data Analysis and Statistics for Geography, Environmental Science, and Engineering
The neighborhood structure can be stored in a binary matrix W: entries w
ij
are 1 or 0; 1 if the
pair of regions are neighbors, and 0 if the pair of regions are not neighbors. For the aforementioned
example, dening neighbors as those regions sharing borders, we have
W =
010110000
101011000
010001001
100010100
110101110
011010011
000
0110010
000011101
001001010
(13.1)
Note that we excluded self-neighbors, i.e., we write 0 in the main diagonal. This is an n × n
matrix where n is the number of regions. Note that the sum of all non-zero intensities is the
number of 1s.
By expressing graphically the existence of neighboring relationship with a line, we obtain another
interesting diagram. Nodes represent region and the lines are links connecting the nodes when the
regions are neighbors (Figure 13.2). The matrix W corresponds to Figure 13.2.
Another way to dene neighbors is by distance separating their centroids. For example, neigh-
bors are those regions with distance between centroids shorter than a threshold or cutoff distance.
For example, if we look at distance between region 1 and all other regions (Figure 13.3), we may
decide that only regions 1 and 4, 1 and 2 are neighbors. However, increasing the cutoff distance, also
regions 1 and 5 would be neighbors.
We can also assign values other than 1 to obtain a weighted neighbor matrix. For example, the
amount of shared border between 1 and 5 is smaller than the amount shared between 1 and 2 and 1
and 4. We can also assign weights based on distance or lengths of the links; the shorter the link con-
necting two nodes, the higher the weight. Consider Figure 13.2. The following matrix summarizes
approximate weights based on distances
1
6
3
2
9
4
5
7
8
FIGURE 13.1 Lattice data: irregular or polygon.
431Spatial Auto-Correlation and Auto-Regression
W =
013013 13 00
00
01500350015035 000
013 00013 0013
0250
///
//
/
....
.00005 0025
00
01 01 0030 03 01 01 0
0022 0220022 00012022
..
.. ....
..
..
.
000004040 00
20
000002 03 02
00
3
0013 0013 01
30
.. .
..
..
///
(13.2)
1
2
3
4
5
6
9
7
8
FIGURE 13.2 Neighborhood node-link diagram.
1
6
3
2
9
4
5
7
8
FIGURE 13.3 Distance between centroids from region 1 to all other regions.

Get Data Analysis and Statistics for Geography, Environmental Science, and Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.