# Chapter 10 Spatial autocorrelation

Session 10

## 10.1 Introduction

In the previous week, we saw that we could combine both the spatial location (as a constraint on the clustering procedure) and the other attributes of that location together in a single clustering algorithm. This is a powerful combination partly because of one of Geography’s central tenets (often referred to as Tobler’s first law):

Everything is related to everything else, but near things are more related than far things.

This notion holds true (to different degrees) for many social processes (I’m sure you can think of many yourself!). Apart from visually inspecting this tendency for similar things (or values) to cluster together in space, we can also directly quantify to what degree a variable is clustered in space. That is, rather than looking for individual clusters in space, we can calculate whether the pattern shows clear signs of clustering in the first place. This is also often referred to as spatial autocorrelation: the degree to which a variable correlates with itself across space. As we will discuss next week, this has major implications for many statistical procedures (e.g., it violates one of the central assumptions in regression analysis) but for now we will just focus on describing the degree to which this might be present in a dataset.

## 10.2 Lagged average over neighbours

To calculate spatial autocorrelation, we need to walk through some similar steps as we did for the spatial clustering in the previous Chapter. Our ultimate goal is to look at the relation between a variable (in our case remaining lease years) and a *spatially lagged* version of that variable.

The intuition behind spatial lag is the following: if I didn’t know the average remaining lease for one hexagon, how could I estimate it if I knew the value of that variable for its neighbours? One thing I could do is take the average over all values of the variable for my hexagon’s neighbours, and use that as an estimator for the unknown average remaining lease.

The first step is to construct a list of neighbours for each area. Here we use the same `remaining_lease`

data as earlier, which you can download here.

```
remaining_lease <- readRDS(here::here("data/remaining_lease.rds"))
hex_sp <- as(remaining_lease, 'Spatial')
hex_neighbours <- poly2nb(hex_sp)
plot(hex_neighbours, coordinates(hex_sp))
```

Once we have our list of neighbours, we need to assign a weight to the relationship to each neighbour. For clustering, we used the similarity in ‘data space’ to determine this weight but here we used a weighted approach: if a hexagon has 4 neighbours, each neighbour will weigh 1/4.

*Note:* In the first assignment, you computed the moving average of a variable over time. “Averaging” in that context meant assigning weights to observations before and after some time \(t\). We are doing the same thing here: we assign weights to neighbouring observations (but here we do it in space, not in time).

Open `hex_weights`

in your Environment and observe the `weights`

attribute!

Once we have the weights, we can calculate a *spatially lagged* version of the variable of interest. In this case this creates an average years of lease left for all the neighbours of a hexagon. `lag.listw`

is an `spdep`

function that does just that.

`lease_lag`

is a list of length equal to the number of hexagons we have. We can plot it, but first we must assign each `lease_lag`

value to its hexagon. Since our `remaining_lease`

object is a data frame containing all heaxagon indices, their average remaining lease and the geometry of the hexagon, we can simply add the lagged version to it and plot.

```
lease_lag_df <- remaining_lease %>%
add_column(lease_lag = lease_lag)
current.mode <- tmap_mode("plot")
tm_shape(lease_lag_df) +
tm_polygons(col = "lease_lag")
```

### 10.2.1 Exercise

Compare the plot of `lease_lag`

that we have just produced with the plot of `remaining_lease`

(you can reuse `lease_lag_df`

since it has a `remaining_lease`

column, or try out `tm_facets`

). What do you observe?

## 10.3 Moran’s I

Lab practice 10.2

We saw in class that we measure spatial autocorrelation from looking at how the point average (the average in one hexagon, contained in `remaining_lease`

) varies with the moving average (the average over a neighbourhood, contained in `lease_lag`

). Let’s plot them both:

What is this plot telling us?

- When
`remaining_lease`

is low,`lease_lag`

is low too. - When
`remaining_lease`

is high,`lease_lag`

is high too.

In other words, there is a positive correlation between `remaining_lease`

and `lease_lag`

!

To measure this correlation, we compare the `lease_lag`

with the original `remaining_lease`

variable by regressing the original variable on the spatially-lagged version. We call this metric the ‘Moran’s I’.

```
##
## Call:
## lm(formula = lease_lag ~ remaining_lease, data = lease_lag_df)
##
## Coefficients:
## (Intercept) remaining_lease
## 35.565 0.504
```

The coefficient or slope of the line is about 0.51, indicating that there is indeed positive autocorrelation: 0 would mean no autocorrelation, -1 fully negative and 1 fully positive.

```
ggplot(lease_lag_df,
aes(x = remaining_lease, y = lease_lag)) +
geom_point() +
geom_smooth(method = lm, se = FALSE)
```

### 10.3.1 Exercise

Based on our bootstrapping approach in Chapter 4, can you think of a way to calculate the statistical significance for this coefficient?

We can also use the `moran.test`

function from the `spdep`

package to calculate Moran’s I in a more automatic fashion. It should give (almost) the same result as our manual regression analysis. What do we conclude from this number?

## 10.4 Local value of autocorrelation

Lab practice 10.3

Just as we can calculate spatial autocorrelation globally, we can also look at this on a more local scale (autocorrelation is very dependent on the scale regardless!). In `spdep`

Moran’s I has a local counterpart called `localmoran`

. Instead of a global metric, it will give an indication on how much clustering is present around each polygon and indicates whether the neighbourhood values indicate a ‘cold’ cluster or a ‘hot’ cluster. In our case: cold means fewer years left on the lease!

```
moran_local <- localmoran(lease_lag_df$remaining_lease, hex_weights)
lease_lag_df$moran <- moran_local[,1]
tm_shape(lease_lag_df) +
tm_polygons(col = "moran", palette = "-RdBu")
```

### 10.4.1 Exercise

Which are cold and hot clusters for the price per square metre variable?