Data Analysis and Statistics for Geography, Environmental Science, and Engineering

429

Spatial Auto-Correlation

and Auto-Regression

13.1 LATTICE DATA: SPATIAL AUTO-CORRELATION

ANDAUTO-REGRESSION

Lattice spatial data are such that the spatial domain is divided into regions and the observations or

variable values are associated with regions. There are two types of lattice data: regular, e.g., grid or

raster, and irregular, e.g., polygons. Variables have a unique value for an entire region. The regions

have a neighborhood structure given by distances between centroids or by the amount of shared

borders.

One important analysis method of lattice data is spatial auto-correlation. Its objective is to

detect spatial patterns based on correlation of a variable among regions, given the neighborhood

structure. This information is useful to understand spatial patterning and to make decisions regard-

ing the applicability of correlation and regression methods among variables. Another important

method is spatial auto-regression (SAR). Its objective is to predict the outcome or value of a

variable in a region based partially on the values of the same variable in neighboring regions and

partially on other variables.

13.2 SPATIAL STRUCTURE AND VARIANCE INFLATION

An important reason for performing auto-correlation is to determine whether the assumptions of

lack of serial correlation to perform regression are appropriate. You should recall now two important

aspects of regression. First, for simple regression: it assumes that values of the independent variable

are independent observations, i.e., they are uncorrelated. This is why we checked forauto-correlation

in time when doing exploratory data analysis. Second, for multiple regression: we demonstrated that

the various independent variables should not be correlated or collinear because this would leadto

distorted values of the regression coefcients, giving more importance to some variables and causing

variance ination.

Correlation among values of the independent variable can occur because they have spatial depen-

dence. Therefore, we need to make sure that the spatial structure does not affect the estimation of

the coefcients. We investigate the potential for this problem using spatial auto-correlation, and

the effect of spatial structure is included using spatial auto-regression.

13.3 NEIGHBORHOOD STRUCTURE

Neighborhood structure provides the covariance structure needed for spatial auto-correlation

and auto-regression. There are several ways of defining neighbor regions: one is by the amount

of common borders, and the other is by the distance separating a reference point of each

region. For example, Figure 13.1 illustrates nine regions. The label identifies the region. The

neighborhood structure is not necessarily symmetric, because it depends on how we define

neighbors.

430 Data Analysis and Statistics for Geography, Environmental Science, and Engineering

The neighborhood structure can be stored in a binary matrix W: entries w

are 1 or 0; 1 if the

pair of regions are neighbors, and 0 if the pair of regions are not neighbors. For the aforementioned

example, dening neighbors as those regions sharing borders, we have

W =

010110000

101011000

010001001

100010100

110101110

011010011

000

0110010

000011101

001001010













(13.1)

Note that we excluded self-neighbors, i.e., we write 0 in the main diagonal. This is an n × n

matrix where n is the number of regions. Note that the sum of all non-zero intensities is the

number of 1s.

By expressing graphically the existence of neighboring relationship with a line, we obtain another

interesting diagram. Nodes represent region and the lines are links connecting the nodes when the

regions are neighbors (Figure 13.2). The matrix W corresponds to Figure 13.2.

Another way to dene neighbors is by distance separating their centroids. For example, neigh-

bors are those regions with distance between centroids shorter than a threshold or cutoff distance.

For example, if we look at distance between region 1 and all other regions (Figure 13.3), we may

decide that only regions 1 and 4, 1 and 2 are neighbors. However, increasing the cutoff distance, also

regions 1 and 5 would be neighbors.

We can also assign values other than 1 to obtain a weighted neighbor matrix. For example, the

amount of shared border between 1 and 5 is smaller than the amount shared between 1 and 2 and 1

and 4. We can also assign weights based on distance or lengths of the links; the shorter the link con-

necting two nodes, the higher the weight. Consider Figure 13.2. The following matrix summarizes

approximate weights based on distances

FIGURE 13.1 Lattice data: irregular or polygon.

431Spatial Auto-Correlation and Auto-Regression

W =

013013 13 00

01500350015035 000

013 00013 0013

0250

///

....

.00005 0025

01 01 0030 03 01 01 0

0022 0220022 00012022

.. ....

000004040 00

000002 03 02

0013 0013 01

.. .

///

















(13.2)

FIGURE 13.2 Neighborhood node-link diagram.

FIGURE 13.3 Distance between centroids from region 1 to all other regions.

Get Data Analysis and Statistics for Geography, Environmental Science, and Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Analysis and Statistics for Geography, Environmental Science, and Engineering by Miguel F. Acevedo

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly