Introduction to Spatial Analysis

Day 4 - Measurement

Jonathan Phillips

January, 2019

Spatial Measurement

What patterns in our data do we want to measure?
- Where events take place
- Whether events are ‘clustered’ in space
- Whether characteristics are ‘clustered’ together (given fixed locations)
- Whether groups are ‘segregated’

Measures of Central Tendency

What is the ‘average’ place where violence occurs in a city?
What is the ‘average’ place where protests happen?
We use the ‘centroid’, the average of all points’ coordinates
- Remember, nothing may happen at the average itself

Measures of Central Tendency

Centre of Mass of Brazilian Protests, 2013. Source: Bastos et al 2014

Measures of Central Tendency

We can also measure the ‘distribution’ of spatial events in two-dimensions
- A ‘spatial ellipse’
- Location, dispersion and direction

Measures of Central Tendency

Population and Mortality in South Asia, Shi et al 2018

Spatial Point Patterns

Is the location of protests random? Or do protestors target specific places?
Is the distribution of hospitals in a city uniform or biased?
We can calculate the number of ‘events’ (points) per \(km^2\)
- But is the ‘expected frequency’ of an event constant across space?
- Often crucial to take into account the background population

Spatial Point Patterns

Our null hypothesis is of ‘complete spatial randomness’
- Each location has an equal probability of an event occurring (poisson model)
- How far away is our distribution of points from this random distribution?
- Simple statistical test
  - Is p<0.05?

Spatial Point Patterns

Location of Baghdad IEDs. Source: Anselin N.D.

Spatial Point Patterns

Location of Baghdad IEDs. Source: Anselin N.D.

Clustering

In many cases, our spatial units are fixed - states, homes, lakes - but we want to know if the characteristics of these objects follow any spatial pattern
First, we need to understand the ‘space’ we are working in

Neighbours

Remember the First Law of Geography?
Spatial analysis depends on some units being closer to each other than others
But when am I ‘closer’ to you?
- Full distance matrices are hard to calculate
- So we normally just identify your ‘neighbour’
- ‘Neighbours’ are other units that are considered ‘closer’

Neighbours

Contiguity-based Neighbours
- For polygons, contiguity means ‘touching’

Neighbours

Contiguity-based Neighbours
- W = Spatial Weights Matrix (NxN)

Neighbours

Distance-based Neighbours
- Usually used to identify ‘nearest-neighbour’ or \(K\) nearest neighbours
- Might not be ‘close’, but is ‘closer’

Clustering

We want to measure how similar neighbouring units are
- The degree of Spatial Autocorrelation
Again, our benchmark is a spatially random distribution of characteristics across our units

Clustering

Random is not one end of the scale, with clustering at the other end
- Random data has clusters! Randomly!
- Just not too many
Random is more like the middle of the scale
- Positive autocorrelation (clustering) on one end
- Negative autocorrelation (dispersion) on the other end

Clustering

Does this data look clustered?
- How do we prove it?
- How much is it clustered?

2016 US Presidential Vote Share

Clustering

We need a measure of spatial autocorrelation (clustering)
- When are two neighbours more likely to have similar characteristics than would be expected at random?
Moran’s I measure of Spatial Autocorrelation
- -1: Perfect negative autocorrelation
- 0: No autocorrelation at all (in large samples)
- 1: Perfect positive autocorrelation

Clustering

Moran’s I:

\[ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})} \]

where:
\(i\) and \(j\) are are units
\(x_i\) and \(x_j\) are the characteristic of interest for units \(i\) and \(j\)
\(w_{ij}\) is the spatial weight between units \(i\) and \(j\)
\(N\) is the total number of units
\(W\) is the sum of the spatial weights

Clustering

Moran’s I:
- We can statistically test whether our value of Moran’s I is higher or lower than we would expect if the characteristic was randomly distributed in space

Clustering

The Moran’s I of Brazilian Presidential voting in 2014:
- I = 0.85
- Expected I = -0.00017
- P-value = 0.00000001

Clustering

Spatial Autocorrelation is complicated and occurs at different distances
- Not just among neighbours
- We can look at patterns of spatial autocorrelation at multiple scales by using a Variogram
A variogram shows the average squared difference in characteristics (eg. vote share) at distance \(d\) for many distances

Clustering

Variogram

Clustering

Local Clustering

But Moran’s I is a global statistic
- It does not tell us where the clustering exists
We can use the calculations of Moran’s I to categorize each unit
Local Indicators of Spatial Autocorrelation (LISA)
- A ‘High’ unit in a ‘High’ cluster
- A ‘Low’ unit in a ‘Low’ cluster
- A ‘High’ unit in a ‘Low’ cluster -> Surprising!
- A ‘Low’ unit in a ‘High’ cluster -> Surprising!
We can also calculate the statistical significance of each unit’s classification

Local Clustering

Romao et al 2017

Spatial Segregation

Sometimes we want to study not a single characteristic but the distribution of multiple (>2) groups in space
- Racial groups in cities
- Skilled workers
- Vote shares for multiple parties
We want to know how ‘segregated’ these groups are into separate spatial areas

Spatial Segregation

Vaughan 1999

Spatial Segregation

NYC Segregation, NY Times

US Segregation, Washington Post

Spatial Segregation

One approach is the Spatial Dissimilarity Index
- A measure of evenness vs. clustering
- On average, how different is the composition of each unit’s local neighbourhood to the composition of the entire region as a whole?

0: Evenness
1: Clustering (segregation)

Spatial Segregation

\[D = \sum_n \sum_i \frac{N_n}{2NI} |t_{ni} - t_i|\]

where:
\(i\) indexes groups
\(n\) indexes neighbourhoods
\(N\) is the total population
\(t_i\) is the % of group \(i\) overall
\(t_{ni}=\frac{L_ni}{L_n}\) is the standardized intensity of group \(i\) in neighbourhood \(n\)
\(I = \sum_i (t_i)(1-t_i)\)

Spatial Segregation

Segregation (Black/White) in US Cities
- Detroit: 0.867
- New York: 0.843
- Chicago: 0.836
- San Francisco: 0.656
- Jacksonville: 0.371

Spatial Segregation

Dissimilarity is a global measure
- But we can also measure local dissimilarity

Local Spatial Dissimilarity Index

Spatial Segregation

Spatial Exposure: The average proportion of group \(j\) in the neighbourhood of group \(i\)

Exposure of White to Coloured Population (https://complexsystemstheory.net/complexity-of-segregation/)

Spatial Segregation

Segregation is complicated
- It varies a lot depending on the scale at which you assess it
- And how we define each unit’s neighbourhood