1. Create a new Rmarkdown document in Rstudio. Load the tidyverse, zeligverse, knitr, stargazer and cepespR packages.
library(tidyverse)
library(zeligverse)
library(knitr)
library(cepespR)
library(stargazer)
  1. Let’s use the CEPESP-R API to download the prefeito candidate voting data from the 2016 municipal elections. See here and the code below.
data <- cepespdata(year=2016, position="Prefeito", regional_aggregation="Municipality",political_aggregation="Candidate")
  1. Run a regression to assess if men get more votes than women (DESCRICAO_SEXO) in the first round of the election (NUM_TURNO), controlling for race (DESCRICAO_COR_RACA).

  2. Create a dummy variable (0/1) for whether each candidate was elected (COD_SIT_TOT_TURNO is ELEITO). Run the appropriate regression to assess if men are more likely to be elected than women, controlling for race (DESCRICAO_COR_RACA).

  3. Produce a neatly formatted table of the regression in question 4 using stargazer.

  4. Does being both black and female make it even harder to get elected? Use an interaction term in a regression to assess this hypothesis. Show the formatted output table.

  5. Use your regression in question 3 to predict how many votes on average an indigenous man would receive in an average election for prefeito. (Remember to make both explanatory variables factors before you run the regression).

  6. Create a histogram (geom_histogram) of the full set of predictions from question 7.

  7. Can you see anything unusual or inconsistent in the predicted values shown in your histogram in Question 8? Run the same code again but change the model to a poisson model that only predicts non-negative integer values. Which histogram makes more sense?

  8. How does the number of votes received change if we change our indigenous man to an indigenous woman? (Hint: Use setx1 after setx to specify a second set of x values. Use either a ls or poisson model.) Report the average of the predicted values for each gender.