Summarizing data with dplyr¶
The following makes use of the dplyr package. You may need to install it
from CRAN using the code
install.packages("dplyr")
if you want to run this on your computer. (The package is already installed on
the notebook container, however.)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Here we use data from the British Election Study 2010. The data set bes2010feelings.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.
load("bes2010feelings.RData")
# A convenience function
Mean <- function(x,...) mean(x,na.rm=TRUE,...)
bes2010feelings %>% group_by(wave,region) %>%
summarize(Brown=Mean(flng.brown),
Cameron=Mean(flng.cameron),
Clegg=Mean(flng.clegg),
N=n())
`summarise()` has grouped output by 'wave'. You can override using the
`.groups` argument.
# A tibble: 7 x 6
# Groups: wave [2]
wave region Brown Cameron Clegg N
<fct> <fct> <dbl> <dbl> <dbl> <int>
1 Pre England 4.09 5.28 4.62 1159
2 Pre Scotland 5.40 4.50 4.41 207
3 Pre Wales 4.33 4.77 4.59 132
4 Pre NA 4.51 4.93 4.43 437
5 Post England 4.14 5.44 5.16 2175
6 Post Scotland 5.51 4.54 4.51 665
7 Post Wales 4.31 4.86 4.81 235
- R file: summarizing-data-with-dplyr.R
- Rmarkdown file: summarizing-data-with-dplyr.Rmd
- Jupyter notebook file: summarizing-data-with-dplyr.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):