Introduction to Spatial Analysis

Day 4 - Measurement

Jonathan Phillips

January, 2019

Spatial Measurement

  • What patterns in our data do we want to measure?
    • Where events take place
    • Whether events are ‘clustered’ in space
    • Whether characteristics are ‘clustered’ together (given fixed locations)
    • Whether groups are ‘segregated’

Measures of Central Tendency

  • What is the ‘average’ place where violence occurs in a city?
  • What is the ‘average’ place where protests happen?

  • We use the ‘centroid’, the average of all points’ coordinates
    • Remember, nothing may happen at the average itself

Measures of Central Tendency

Centre of Mass of Brazilian Protests, 2013. Source: Bastos et al 2014

Centre of Mass of Brazilian Protests, 2013. Source: Bastos et al 2014

Measures of Central Tendency

  • We can also measure the ‘distribution’ of spatial events in two-dimensions
    • A ‘spatial ellipse’
    • Location, dispersion and direction

Measures of Central Tendency

Population and Mortality in South Asia, Shi et al 2018

Population and Mortality in South Asia, Shi et al 2018

Spatial Point Patterns

  • Is the location of protests random? Or do protestors target specific places?
  • Is the distribution of hospitals in a city uniform or biased?

  • We can calculate the number of ‘events’ (points) per \(km^2\)
    • But is the ‘expected frequency’ of an event constant across space?
    • Often crucial to take into account the background population

Spatial Point Patterns

  • Our null hypothesis is of ‘complete spatial randomness’
    • Each location has an equal probability of an event occurring (poisson model)
    • How far away is our distribution of points from this random distribution?
    • Simple statistical test
      • Is p<0.05?

Spatial Point Patterns

Location of Baghdad IEDs. Source: Anselin N.D.

Location of Baghdad IEDs. Source: Anselin N.D.

Spatial Point Patterns

Location of Baghdad IEDs. Source: Anselin N.D.

Location of Baghdad IEDs. Source: Anselin N.D.

Clustering

  • In many cases, our spatial units are fixed - states, homes, lakes - but we want to know if the characteristics of these objects follow any spatial pattern

  • First, we need to understand the ‘space’ we are working in

Neighbours

  • Remember the First Law of Geography?
  • Spatial analysis depends on some units being closer to each other than others

  • But when am I ‘closer’ to you?
    • Full distance matrices are hard to calculate
    • So we normally just identify your ‘neighbour’
    • ‘Neighbours’ are other units that are considered ‘closer’

Neighbours

  1. Contiguity-based Neighbours
    • For polygons, contiguity means ‘touching’

Neighbours

  1. Contiguity-based Neighbours
    • W = Spatial Weights Matrix (NxN)

Neighbours

  1. Distance-based Neighbours
    • Usually used to identify ‘nearest-neighbour’ or \(K\) nearest neighbours
    • Might not be ‘close’, but is ‘closer’

Clustering

  • We want to measure how similar neighbouring units are
    • The degree of Spatial Autocorrelation
  • Again, our benchmark is a spatially random distribution of characteristics across our units

Clustering

Clustering

  • Random is not one end of the scale, with clustering at the other end
    • Random data has clusters! Randomly!
    • Just not too many
  • Random is more like the middle of the scale
    • Positive autocorrelation (clustering) on one end
    • Negative autocorrelation (dispersion) on the other end

Clustering

Clustering

Clustering

  • Does this data look clustered?
    • How do we prove it?
    • How much is it clustered?

2016 US Presidential Vote Share

2016 US Presidential Vote Share

Clustering

  • We need a measure of spatial autocorrelation (clustering)
    • When are two neighbours more likely to have similar characteristics than would be expected at random?
  • Moran’s I measure of Spatial Autocorrelation
    • -1: Perfect negative autocorrelation
    • 0: No autocorrelation at all (in large samples)
    • 1: Perfect positive autocorrelation

Clustering

Clustering

  • Moran’s I:

\[ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})} \]

where:
\(i\) and \(j\) are are units
\(x_i\) and \(x_j\) are the characteristic of interest for units \(i\) and \(j\)
\(w_{ij}\) is the spatial weight between units \(i\) and \(j\)
\(N\) is the total number of units
\(W\) is the sum of the spatial weights

Clustering

  • Moran’s I:
    • We can statistically test whether our value of Moran’s I is higher or lower than we would expect if the characteristic was randomly distributed in space

Clustering

Clustering

  • The Moran’s I of Brazilian Presidential voting in 2014:
    • I = 0.85
    • Expected I = -0.00017
    • P-value = 0.00000001

Clustering

  • Spatial Autocorrelation is complicated and occurs at different distances
    • Not just among neighbours
    • We can look at patterns of spatial autocorrelation at multiple scales by using a Variogram
  • A variogram shows the average squared difference in characteristics (eg. vote share) at distance \(d\) for many distances

Clustering

  • Variogram

Clustering

Local Clustering

  • But Moran’s I is a global statistic
    • It does not tell us where the clustering exists
  • We can use the calculations of Moran’s I to categorize each unit

  • Local Indicators of Spatial Autocorrelation (LISA)
    • A ‘High’ unit in a ‘High’ cluster
    • A ‘Low’ unit in a ‘Low’ cluster
    • A ‘High’ unit in a ‘Low’ cluster -> Surprising!
    • A ‘Low’ unit in a ‘High’ cluster -> Surprising!
  • We can also calculate the statistical significance of each unit’s classification

Local Clustering

Local Clustering

Romao et al 2017

Romao et al 2017

Spatial Segregation

  • Sometimes we want to study not a single characteristic but the distribution of multiple (>2) groups in space
    • Racial groups in cities
    • Skilled workers
    • Vote shares for multiple parties
  • We want to know how ‘segregated’ these groups are into separate spatial areas

Spatial Segregation

Vaughan 1999

Vaughan 1999

Spatial Segregation

NYC Segregation, NY Times

US Segregation, Washington Post

Spatial Segregation

Spatial Segregation

  • One approach is the Spatial Dissimilarity Index
    • A measure of evenness vs. clustering
    • On average, how different is the composition of each unit’s local neighbourhood to the composition of the entire region as a whole?

0: Evenness
1: Clustering (segregation)

Spatial Segregation

\[D = \sum_n \sum_i \frac{N_n}{2NI} |t_{ni} - t_i|\]

where:
\(i\) indexes groups
\(n\) indexes neighbourhoods
\(N\) is the total population
\(t_i\) is the % of group \(i\) overall
\(t_{ni}=\frac{L_ni}{L_n}\) is the standardized intensity of group \(i\) in neighbourhood \(n\)
\(I = \sum_i (t_i)(1-t_i)\)

Spatial Segregation

  • Segregation (Black/White) in US Cities
    • Detroit: 0.867
    • New York: 0.843
    • Chicago: 0.836
    • San Francisco: 0.656
    • Jacksonville: 0.371

Spatial Segregation

  • Dissimilarity is a global measure
    • But we can also measure local dissimilarity
Local Spatial Dissimilarity Index

Local Spatial Dissimilarity Index

Spatial Segregation

  • Spatial Exposure: The average proportion of group \(j\) in the neighbourhood of group \(i\)
Exposure of White to Coloured Population (https://complexsystemstheory.net/complexity-of-segregation/)

Exposure of White to Coloured Population (https://complexsystemstheory.net/complexity-of-segregation/)

Spatial Segregation

  • Segregation is complicated
    • It varies a lot depending on the scale at which you assess it
    • And how we define each unit’s neighbourhood