# Chapter 10 Spatial autocorrelation

Session 10

## 10.1 Introduction

In the previous week, we saw that we could combine both the spatial location (as a constraint on the clustering procedure) and the other attributes of that location together in a single clustering algorithm. This is a powerful combination partly because of one of Geography’s central tenets (often referred to as Tobler’s first law):

Everything is related to everything else, but near things are more related than far things.

This notion holds true (to different degrees) for many social processes (I’m sure you can think of many yourself!). Apart from visually inspecting this tendency for similar things (or values) to cluster together in space, we can also directly quantify to what degree a variable is clustered in space. That is, rather than looking for individual clusters in space, we can calculate whether the pattern shows clear signs of clustering in the first place. This is also often referred to as spatial autocorrelation: the degree to which a variable correlates with itself across space. As we will discuss next week, this has major implications for many statistical procedures (e.g., it violates one of the central assumptions in regression analysis) but for now we will just focus on describing the degree to which this might be present in a dataset.

## 10.2 Lagged average over neighbours

Lab practices 10.0 + 10.1

To calculate spatial autocorrelation, we need to walk through some similar steps as we did for the spatial clustering in the previous Chapter. Our ultimate goal is to look at the relation between a variable (in our case remaining lease years) and a spatially lagged version of that variable.

The intuition behind spatial lag is the following: if I didn’t know the average remaining lease for one hexagon, how could I estimate it if I knew the value of that variable for its neighbours? One thing I could do is take the average over all values of the variable for my hexagon’s neighbours, and use that as an estimator for the unknown average remaining lease.

The first step is to construct a list of neighbours for each area. Here we use the same remaining_lease data as earlier, which you can download here.

remaining_lease <- readRDS(here::here("data/remaining_lease.rds"))
hex_sp <- as(remaining_lease, 'Spatial')
hex_neighbours <- poly2nb(hex_sp)
plot(hex_neighbours, coordinates(hex_sp))

Once we have our list of neighbours, we need to assign a weight to the relationship to each neighbour. For clustering, we used the similarity in ‘data space’ to determine this weight but here we used a weighted approach: if a hexagon has 4 neighbours, each neighbour will weigh 1/4.

Note: In the first assignment, you computed the moving average of a variable over time. “Averaging” in that context meant assigning weights to observations before and after some time $$t$$. We are doing the same thing here: we assign weights to neighbouring observations (but here we do it in space, not in time).

hex_weights <- nb2listw(hex_neighbours, style="W", zero.policy=TRUE)

Open hex_weights in your Environment and observe the weights attribute!

Once we have the weights, we can calculate a spatially lagged version of the variable of interest. In this case this creates an average years of lease left for all the neighbours of a hexagon. lag.listw is an spdep function that does just that.

## 10.4 Local value of autocorrelation

Lab practice 10.3

Just as we can calculate spatial autocorrelation globally, we can also look at this on a more local scale (autocorrelation is very dependent on the scale regardless!). In spdep Moran’s I has a local counterpart called localmoran. Instead of a global metric, it will give an indication on how much clustering is present around each polygon and indicates whether the neighbourhood values indicate a ‘cold’ cluster or a ‘hot’ cluster. In our case: cold means fewer years left on the lease!

moran_local <- localmoran(lease_lag_df$remaining_lease, hex_weights) lease_lag_df$moran <- moran_local[,1]

tm_shape(lease_lag_df) +
tm_polygons(col = "moran", palette = "-RdBu")

### 10.4.1 Exercise

Which are cold and hot clusters for the price per square metre variable?