Aggregating data frames¶
In the following we aggregate data from the British Election Study 2010. The data set bes2010feelings.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.
load("bes2010feelings.RData")
Here we obtain the average affects towards the major three parties, using an
‘old-style’ call of the function aggregate()
.
Mean <- function(x,...)mean(x,...,na.rm=TRUE)
aggregate(bes2010feelings[c("flng.brown","flng.cameron",
"flng.clegg","flng.salmond")],
with(bes2010feelings,
list(Region=region,Wave=wave)),
Mean)
Region Wave flng.brown flng.cameron flng.clegg flng.salmond
1 England Pre 4.092674 5.284810 4.618690 NaN
2 Scotland Pre 5.395000 4.502591 4.405229 4.412371
3 Wales Pre 4.328244 4.774194 4.592233 NaN
4 England Post 4.140990 5.441454 5.160313 NaN
5 Scotland Post 5.510769 4.539075 4.513793 4.228707
6 Wales Post 4.307692 4.855895 4.814480 NaN
More recent versions of R also provide a slightly more convenient way of
calling aggregate()
using a formula argument:
aggregate(cbind(flng.brown,
flng.cameron,
flng.clegg,
flng.salmond
)~region+wave,
data=bes2010feelings,
Mean)
region wave flng.brown flng.cameron flng.clegg flng.salmond
1 Scotland Pre 5.466667 4.500000 4.460000 4.480000
2 Scotland Post 5.513986 4.513986 4.498252 4.270979
The memisc package has a somewhat more flexible variant of aggregate()
, the function Aggregate()
. Here we reproduce the results of aggregate()
. You may need to install this package using install.packages("memisc")
from
CRAN if you want to run this on your computer. (The package is already installed on the notebook container, however.)
library(memisc)
Loading required package: lattice
Loading required package: MASS
Attaching package: 'memisc'
The following object is masked _by_ '.GlobalEnv':
Mean
The following objects are masked from 'package:stats':
contr.sum, contr.treatment, contrasts
The following object is masked from 'package:base':
as.array
Aggregate(c(Brown=Mean(flng.brown),
Cameron=Mean(flng.cameron),
Clegg=Mean(flng.clegg),
Salmond=Mean(flng.salmond))~region+wave,
data=bes2010feelings)
region wave Brown Cameron Clegg Salmond
1 England Pre 4.092674 5.284810 4.618690 NaN
2 Scotland Pre 5.395000 4.502591 4.405229 4.412371
3 Wales Pre 4.328244 4.774194 4.592233 NaN
4 <NA> Pre 4.507143 4.929870 4.426573 4.760563
5 England Post 4.140990 5.441454 5.160313 NaN
6 Scotland Post 5.510769 4.539075 4.513793 4.228707
7 Wales Post 4.307692 4.855895 4.814480 NaN
8 <NA> Post NA NA NA NA
However it also allows to used different summary functions.
Var <- function(x,...) var(x,...,na.rm=TRUE)
Aggregate(c(Mean(flng.brown),Var(flng.brown))~region+wave,
data=bes2010feelings)
region wave Mean(flng.brown) Var(flng.brown)
1 England Pre 4.092674 7.287340
2 Scotland Pre 5.395000 8.210025
3 Wales Pre 4.328244 8.776042
4 <NA> Pre 4.507143 7.754125
5 England Post 4.140990 7.109491
6 Scotland Post 5.510769 6.376617
7 Wales Post 4.307692 7.647408
8 <NA> Post NA NA
- R file: aggregating.R
- Rmarkdown file: aggregating.Rmd
- Jupyter notebook file: aggregating.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):