1. Create a new Rmarkdown document in Rstudio. Load the tidyverse,zeligverse, knitr and cepespR packages. Install and load the readxl package.
library(tidyverse)
library(zeligverse)
library(knitr)
library(cepespR)
library(readxl)
  1. Let’s use the CEPESP-R API to download the prefeito candidate voting data from the 2016 municipal elections. See here and the code below.
data <- cepespdata(year=2016, position="Prefeito", regional_aggregation="Municipality",political_aggregation="Candidate")
  1. Let’s also get data from IBGE on municipalities:
download.file("ftp://ftp.ibge.gov.br/Perfil_Municipios/2005/base_MUNIC_2005.zip","perfil.zip")
unzip("perfil.zip")
ibge <- read_excel(paste0(getwd(),"/Base 2005.xls"),"Leg e inst planej")
  1. Now we have two tables and want to link them. First, we need a key in each table for each unit, and luckily here we can use each municipality’s IBGE code. In the cepesp data, this is called COD_MUN_IBGE and in the IBGE data it is called A1. But COD_MUN_IBGE has 7 digits while A has 6. Use the function separate on the cepesp data to make a column with the 6 digit IBGE code and call it COD_MUN_IBGE_6 (the seventh digit is a ‘check’ digit so it doesn’t matter if we drop it).
data <- data %>% separate(COD_MUN_IBGE,into=c("COD_MUN_IBGE_6","Ignore"),sep=6)
  1. Now we need to make sure the key column in each table has the same name. Rename the IBGE column A1 to also be named COD_MUN_IBGE_6.
ibge <- ibge %>% rename("COD_MUN_IBGE_6"="A1")
  1. One more problem: the two columns are now of different types. Make sure both columns COD_MUN_IBGE_6 are numeric.
data <- data %>% mutate(COD_MUN_IBGE_6=as.numeric(COD_MUN_IBGE_6))
  1. Now we can start with the cepesp electoral data and merge in the ibge dataset with a left_join. How many rows are there?
data %>% left_join(ibge,by="COD_MUN_IBGE_6") %>% count()
  1. Note that in question 7 above the IBGE information was duplicated for every candidate in each municipality. There was a “many-to-one” relationship. Now let’s try a merge with a one-to-one relationship between our tables, which means we need to restructure our electoral data to have only one row for each municipality. Using the original cepesp data, calculate the total number of votes in each municipality and save this as a new data.frame.
total_mun <- data %>% group_by(COD_MUN_IBGE_6) %>% summarize(QTDE_VOTOS=sum(QTDE_VOTOS,na.rm=TRUE))
  1. Merge this new electoral data with the IBGE data using a left_join. How many rows are there? How many rows are there with a right_join? With an inner_join? With a full_join?
total_mun %>% left_join(ibge,by="COD_MUN_IBGE_6") %>% count()
total_mun %>% right_join(ibge,by="COD_MUN_IBGE_6") %>% count()
total_mun %>% inner_join(ibge,by="COD_MUN_IBGE_6") %>% count()
total_mun %>% full_join(ibge,by="COD_MUN_IBGE_6") %>% count()
  1. The row numbers above suggest that some municipalities are in each dataset that are not in the other. Can you identify which (using their COD_MUN_IBGE_6)?
#In electoral data but not IBGE:
total_mun$COD_MUN_IBGE_6[which(!(total_mun$COD_MUN_IBGE_6 %in% ibge$COD_MUN_IBGE_6))]

#In IBGE but not in electoral data:
ibge$COD_MUN_IBGE_6[which(ibge$COD_MUN_IBGE_6 %in% total_mun$COD_MUN_IBGE_6==FALSE)]
  1. Let’s estimate if municipalities with more voters are more likely to have a Conselho Municipal de Política urbana, variable A64 in the IBGE data. Use case_when to turn this variable into a binary 0/1 variable so we can use it as the outcome variable in a logit regression.
total_mun2 <- total_mun %>% inner_join(ibge,by="COD_MUN_IBGE_6") %>% 
  mutate(Conselho=case_when(A64=="Sim"~1,
                            TRUE~0))
  1. According to a quick logit regression in Zelig, are municipalities with more voters more likely to have a Conselho Municipal?
total_mun2 %>% zelig(Conselho~QTDE_VOTOS,data=.,model="logit")
  1. Now use case_when to recode variable A67, the year in which the Lei orgânica do município was passed, into a binary variable which is 1 if the law was passed before 1990. Also remember to recode the value Ignorado to NA - to do this you need to specify the new value as NA_real within case_when.
total_mun3 <- total_mun %>% inner_join(ibge,by="COD_MUN_IBGE_6") %>% 
  mutate(Lei=case_when(as.numeric(A67)>1990~0,
                       as.numeric(A67)<=1990~1,
                       A76=="Ignorado"~NA_real_))
  1. According to a quick logit regresion, are municipalities with more voters more likely to have passed a Lei orgânica do município before 1990?
total_mun3 %>% zelig(Lei~QTDE_VOTOS,data=.,model="logit")
  1. Organize your results to the above questions in a clean R Markdown document (as PDF or HTML), using chunk parameters to block the warnings, messages and raw code itself. The final document should contain only explanation text, tables and graphics.

  2. Download the bibliography file example.bib to your working directory using the code below. Reference this bibliography file in your document and reference the Fujiwara paper in your document using its key @Fujiwara2011.

download.file("https://raw.githubusercontent.com/JonnyPhillips/Curso_R/master/Example.bib?token=AAg1ZYnoaVwdKWkwwnQbBLH2YO6b8T9Gks5a40uBwA%3D%3D",destfile="example.bib")