Counting words in the UK party manifesto on occasion of the 2017 election

The file “UKLabourParty_201706.csv” was downloaded from the Manifesto Project website. Redistribution of the data is prohibited, so readers who want to preproduce the following will need to download their own copy of the data set and upload it to the virtual machine that runs this notebook. To do this,

  1. pull down the “File” menu item and select “Open”
  2. An overview of the folder that contains the notebook opens.
  3. The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the Manifesto Project website.

Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).

# First, the data are read in
Labour.2017 <- read.csv("UKLabourParty_201706.csv",
                        stringsAsFactors=FALSE)
# Second, some non-ascii characters are substituted 
Labour.2017$content <- gsub("\xE2\x80\x99","'",Labour.2017$content)
str(Labour.2017)
'data.frame':	1396 obs. of  3 variables:
 $ content : chr  "CREATING AN ECONOMY THAT WORKS FOR ALL" "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few." "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives." "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, invest"| __truncated__ ...
 $ cmp_code: chr  "H" "503" "503" "405" ...
 $ eu_code : logi  NA NA NA NA NA NA ...
# The variable 'content' contains the text of the manifesto 
Labour.2017 <- Labour.2017$content
Labour.2017[1:5]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"                                                                                            
[2] "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."                  
[3] "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."    
[4] "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."
[5] "Each contributes and each must share fairly in the rewards."                                                                       
# The headings in the manifesto are all-uppercase, this helps
# to identify them:
Labour.2017.hlno <- which(Labour.2017==toupper(Labour.2017))
Labour.2017.headings <- Labour.2017[Labour.2017.hlno]
Labour.2017.headings[1:4]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"
[2] "A FAIR TAXATION SYSTEM"                
[3] "BALANCING THE BOOKS"                   
[4] "INFRASTRUCTURE INVESTMENT"             
# All non-heading text is changed to lowercase
labour.2017 <- tolower(Labour.2017[-Labour.2017.hlno])
labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."                                                           
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."                                             
[3] "labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."                                         
[4] "each contributes and each must share fairly in the rewards."                                                                                                                
[5] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."
# All lines that contain the pattern 'econom' are collected
ecny.labour.2017 <- grep("econom",labour.2017,value=TRUE)
ecny.labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."                                                           
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."                                             
[3] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."
[4] "britain is the only major developed economy where earnings have fallen even as growth has returned after the financial crisis."                                             
[5] "we will upgrade our economy, breaking down the barriers that hold too many of us back,"                                                                                     
# Using 'strsplit()' the lines are split into words
labour.2017.words <- strsplit(labour.2017,"[ ,.;:]+")
str(labour.2017.words[1:5])
List of 5
 $ : chr [1:18] "labour's" "economic" "strategy" "is" ...
 $ : chr [1:23] "we" "will" "measure" "our" ...
 $ : chr [1:17] "labour" "understands" "that" "the" ...
 $ : chr [1:10] "each" "contributes" "and" "each" ...
 $ : chr [1:32] "this" "manifesto" "sets" "out" ...
# The result is a list. We change it into a character vector.
labour.2017.words <- unlist(labour.2017.words)
labour.2017.words[1:20]
 [1] "labour's"   "economic"   "strategy"   "is"         "about"     
 [6] "delivering" "a"          "fairer"     "more"       "prosperous"
[11] "society"    "for"        "the"        "many"       "not"       
[16] "just"       "the"        "few"        "we"         "will"      
# We now count the words and look at the 20 most common ones.
labour.2017.nwords <- table(labour.2017.words)
labour.2017.nwords <- sort(labour.2017.nwords,decreasing=TRUE)
labour.2017.nwords[1:20]
labour.2017.words
   the    and     to   will     of      a     we     in labour    for    our 
  1202    947    832    664    625    438    418    369    313    312    244 
  that     on   with     by     is    are     as   have ensure 
   232    212    185    161    161    134    112    108    104