Comparing poststratification, raking, and calibration with ANES data¶
The following makes use of the packages survey and memisc. You may need to
install them from CRAN using the code
install.packages(c("survey","memisc"))
if you want to run this on your computer. (The packages are already installed on the notebook container, however.)
library(survey)
Loading required package: grid
Loading required package: Matrix
Loading required package: survival
Attaching package: 'survey'
The following object is masked from 'package:graphics':
dotchart
library(memisc)
Loading required package: lattice
Loading required package: MASS
Attaching package: 'memisc'
The following object is masked from 'package:Matrix':
as.array
The following objects are masked from 'package:stats':
contr.sum, contr.treatment, contrasts
The following object is masked from 'package:base':
as.array
This loads data files created in earlier examples.
load("anes-2016-vprevote-design.RData")
load("anes-2016-prevote-desgn-post.RData")
load("anes-2016-prevote-desgn-rake.RData")
load("anes-2016-prevote-desgn-calib.RData")
Let’s compare the effect of poststratification and raking on the relation between variables.
First, we create a table from the data with valid responses about voting behaviour in 2012 and 2016.
tab <- svytable(~ vote16 + recall12,
design = anes_2016_vprevote_desgn)
percentages(vote16 ~ recall12, data=tab)
recall12
vote16 Obama Romney Other No vote
Clinton 76.187161 3.813877 9.241746 22.757762
Trump 8.644019 74.683608 35.531586 23.783194
Other 8.228826 7.974947 55.226668 4.516499
No vote 6.939995 13.527568 0.000000 48.942545
Second, we create a table from the poststatified data.
tab_post <- svytable(~ vote16 + recall12,
design = anes_2016_prevote_desgn_post)
percentages(vote16 ~ recall12, data=tab_post)
recall12
vote16 Obama Romney Other No vote
Clinton 76.187161 3.813877 9.241746 22.757762
Trump 8.644019 74.683608 35.531586 23.783194
Other 8.228826 7.974947 55.226668 4.516499
No vote 6.939995 13.527568 0.000000 48.942545
Third, we create a table from the raked data.
tab_rak <- svytable(~ vote16 + recall12,
design = anes_2016_prevote_desgn_rake)
percentages(vote16 ~ recall12, data=tab_rak)
recall12
vote16 Obama Romney Other No vote
Clinton 70.403656 3.152370 12.125475 12.579417
Trump 8.177831 63.198219 47.727460 13.458918
Other 4.213195 3.652234 40.147065 1.383226
No vote 17.205318 29.997177 0.000000 72.578439
Fourth, we create a table from the calibrated data
tab_calib <- svytable(~ vote16 + recall12,
design = anes_2016_prevote_desgn_calib)
percentages(vote16 ~ recall12, data=tab_calib)
recall12
vote16 Obama Romney Other No vote
Clinton 69.137748 3.114927 11.193203 13.406539
Trump 8.016145 62.183500 43.631304 14.227998
Other 3.637356 3.547990 45.175493 1.694680
No vote 19.208751 31.153583 0.000000 70.670783
Poststratification does not alter percentages that are conditional on the variable used for poststratification. Yet raking does change the conditional percentages.
To examine whether raking affects relations between recalled vote in 2012 and vote in 2016 we compute log-odds ratios:
log.odds <- function(x) log((x[1,1]/x[1,2])/(x[2,1]/x[2,2]))
Log-odds ratios are a way to describe the relation between two dichotomous variables. Like correlations between continuous variables they are not affected by the marginal distribution.
log.odds(tab)
[1] 5.15094
log.odds(tab_post)
[1] 5.15094
log.odds(tab_rak)
[1] 5.15094
log.odds(tab_calib)
[1] 5.148527
Clearly, both poststratfication and raking leaves log-odds ratios unaffected. Calibration has an effect, but this appears to be minor (at least in the present case).
- R file: comparing-ANES2016.R
- Rmarkdown file: comparing-ANES2016.Rmd
- Jupyter notebook file: comparing-ANES2016.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):