Aggregating data frames

In the following we aggregate data from the British Election Study 2010. The data set bes2010feelings.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.

load("bes2010feelings.RData")

Here we obtain the average affects towards the major three parties, using an ‘old-style’ call of the function aggregate().

Mean <- function(x,...)mean(x,...,na.rm=TRUE)
aggregate(bes2010feelings[c("flng.brown","flng.cameron",
                            "flng.clegg","flng.salmond")],
          with(bes2010feelings,
               list(Region=region,Wave=wave)),
          Mean)
    Region Wave flng.brown flng.cameron flng.clegg flng.salmond
1  England  Pre   4.092674     5.284810   4.618690          NaN
2 Scotland  Pre   5.395000     4.502591   4.405229     4.412371
3    Wales  Pre   4.328244     4.774194   4.592233          NaN
4  England Post   4.140990     5.441454   5.160313          NaN
5 Scotland Post   5.510769     4.539075   4.513793     4.228707
6    Wales Post   4.307692     4.855895   4.814480          NaN

More recent versions of R also provide a slightly more convenient way of calling aggregate() using a formula argument:

aggregate(cbind(flng.brown,
                flng.cameron,
                flng.clegg,
                flng.salmond
                )~region+wave,
          data=bes2010feelings,
          Mean)
    region wave flng.brown flng.cameron flng.clegg flng.salmond
1 Scotland  Pre   5.466667     4.500000   4.460000     4.480000
2 Scotland Post   5.513986     4.513986   4.498252     4.270979

The memisc package has a somewhat more flexible variant of aggregate(), the function Aggregate(). Here we reproduce the results of aggregate(). You may need to install this package using install.packages("memisc") from CRAN if you want to run this on your computer. (The package is already installed on the notebook container, however.)

library(memisc)
Loading required package: lattice
Loading required package: MASS

Attaching package: 'memisc'

The following object is masked _by_ '.GlobalEnv':

    Mean

The following objects are masked from 'package:stats':

    contr.sum, contr.treatment, contrasts

The following object is masked from 'package:base':

    as.array
Aggregate(c(Brown=Mean(flng.brown),
            Cameron=Mean(flng.cameron),
            Clegg=Mean(flng.clegg),
            Salmond=Mean(flng.salmond))~region+wave,
            data=bes2010feelings)
    region wave    Brown  Cameron    Clegg  Salmond
1  England  Pre 4.092674 5.284810 4.618690      NaN
2 Scotland  Pre 5.395000 4.502591 4.405229 4.412371
3    Wales  Pre 4.328244 4.774194 4.592233      NaN
4     <NA>  Pre 4.507143 4.929870 4.426573 4.760563
5  England Post 4.140990 5.441454 5.160313      NaN
6 Scotland Post 5.510769 4.539075 4.513793 4.228707
7    Wales Post 4.307692 4.855895 4.814480      NaN
8     <NA> Post       NA       NA       NA       NA

However it also allows to used different summary functions.

Var <- function(x,...) var(x,...,na.rm=TRUE)
Aggregate(c(Mean(flng.brown),Var(flng.brown))~region+wave,
          data=bes2010feelings)
    region wave Mean(flng.brown) Var(flng.brown)
1  England  Pre         4.092674        7.287340
2 Scotland  Pre         5.395000        8.210025
3    Wales  Pre         4.328244        8.776042
4     <NA>  Pre         4.507143        7.754125
5  England Post         4.140990        7.109491
6 Scotland Post         5.510769        6.376617
7    Wales Post         4.307692        7.647408
8     <NA> Post               NA              NA