Merging data frames: An example with data from the British Election Study

Here we merge data from the British Election Study. Here we use data from the British Election Study 2010. The data set bes2010feelings-prepost-for-merge.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.

load("bes2010feelings-prepost-for-merge.RData")

A peek into a data frame about respondents’ feelings about parties:

head(bes2010flngs_parties_pre)
        id    refno vote   flng.labour flng.cons flng.libdem flng.snp flng.pcym
40103.1 40103 312   NA     5           6         4           NA       NA       
40107.1 40107 312   NA     1           6         7           NA       NA       
40109.1 40109 312   NA     3           4         5           NA       NA       
40110.1 40110 312   Labour 6           6         5           NA       NA       
40111.1 40111 312   Labour 8           4         5           NA       NA       
40112.1 40112 312   Labour 5           1         4           NA       NA       
        flng.green flng.ukip flng.bnp region 
40103.1 7           3        0        England
40107.1 6           0        0        NA     
40109.1 5           0        0        England
40110.1 5           3        2        England
40111.1 4          NA        2        NA     
40112.1 4           0        0        England

And anotehr peek into a data frame about respondents’ feelings about party leaders:

head(bes2010flngs_leaders_pre)
        id    flng.brown flng.cameron flng.clegg flng.salmond flng.jones
40103.1 40103 6          3            3          NA            5        
40107.1 40107 3          7            5          NA            3        
40109.1 40109 8          7            4          NA           10        
40110.1 40110 4          4            3          NA            7        
40111.1 40111 5          5            5          NA            5        
40112.1 40112 5          0            4          NA            1        

The variable that identifies individual respondents in both data frames is ‘id’, so we use this variable to match the rows in both data frames:

bes2010flngs_pre_merged <- merge(
    bes2010flngs_parties_pre,
    bes2010flngs_leaders_pre,
    by="id"
)

merge() also allows for identifier variables that may have different names in the two data frame. In such cases one can use the named arguments by.x= and by.y=:

bes2010flngs_pre_merged <- merge(
    bes2010flngs_parties_pre,
    bes2010flngs_leaders_pre,
    by.x="id",
    by.y="id"
)

It is not absolutely necessary to provide a by= argument, if the merged data frames share a variable (with the same name in both) that idenfies cases or observations. Therefore, we can call merge() here without any by=, by.x=, or by.y= arguments:

bes2010flngs_pre_merged <- merge(
    bes2010flngs_parties_pre,
    bes2010flngs_leaders_pre
)
head(bes2010flngs_pre_merged)
  id    refno vote   flng.labour flng.cons flng.libdem flng.snp flng.pcym
1 40103 312   NA     5           6         4           NA       NA       
2 40107 312   NA     1           6         7           NA       NA       
3 40109 312   NA     3           4         5           NA       NA       
4 40110 312   Labour 6           6         5           NA       NA       
5 40111 312   Labour 8           4         5           NA       NA       
6 40112 312   Labour 5           1         4           NA       NA       
  flng.green flng.ukip flng.bnp region  flng.brown flng.cameron flng.clegg
1 7           3        0        England 6          3            3         
2 6           0        0        NA      3          7            5         
3 5           0        0        England 8          7            4         
4 5           3        2        England 4          4            3         
5 4          NA        2        NA      5          5            5         
6 4           0        0        England 5          0            4         
  flng.salmond flng.jones
1 NA            5        
2 NA            3        
3 NA           10        
4 NA            7        
5 NA            5        
6 NA            1        

The data frame constwin contains data about relectoral districts, that is which party won the respective district seat in 2005 and 2010. The variable that identifies the electoral district is both in the individual-level data frame and the district-level data frame is named refno, so we use this as a matching variable.

bes2010pre_merged <- merge(
    bes2010flngs_pre_merged,
    constwin,
    by = "refno" # Not necessary in the present case, because
)                # it is the same in both data frames.

As can be glimpsed from the output of str, the result of merge is sorted by the matching variable, i.e. “refno”

head(bes2010pre_merged)
  refno id    vote          flng.labour flng.cons flng.libdem flng.snp
1 1     77920 Plaid Cymru    6          5         5           NA      
2 1     57911 NA             5          3         3           NA      
3 1     57905 Labour        10          0         3           NA      
4 1     57906 Labour        10          0         4           NA      
5 1     57910 Conservatives  0          9         3           NA      
6 1     57902 Conservatives  8          9         6           NA      
  flng.pcym flng.green flng.ukip ⋯ flng.brown flng.cameron flng.clegg
1 7         7          5         ⋯  0         8            NA        
2 3         4          0         ⋯  6         5             5        
3 5         3          4         ⋯  8         0             4        
4 3         0          6         ⋯ 10         0             0        
5 3         6          2         ⋯  0         9             0        
6 5         5          6         ⋯  4         8             6        
  flng.salmond flng.jones seat     win05  win10  maj05 maj10
1 NA            4         Aberavon Labour Labour 46.3  35.7 
2 NA            6         Aberavon Labour Labour 46.3  35.7 
3 NA            0         Aberavon Labour Labour 46.3  35.7 
4 NA           10         Aberavon Labour Labour 46.3  35.7 
5 NA            0         Aberavon Labour Labour 46.3  35.7 
6 NA            7         Aberavon Labour Labour 46.3  35.7