Counting words in the UK party manifesto on occasion of the 2017 election¶
The file “UKLabourParty_201706.csv” was downloaded from the Manifesto Project website. Redistribution of the data is prohibited, so readers who want to preproduce the following will need to download their own copy of the data set and upload it to the virtual machine that runs this notebook. To do this,
- pull down the “File” menu item and select “Open”
- An overview of the folder that contains the notebook opens.
- The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the Manifesto Project website.
Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).
# First, the data are read in
Labour.2017 <- read.csv("UKLabourParty_201706.csv",
stringsAsFactors=FALSE)
# Second, some non-ascii characters are substituted
Labour.2017$content <- gsub("\xE2\x80\x99","'",Labour.2017$content)
str(Labour.2017)
'data.frame': 1396 obs. of 3 variables:
$ content : chr "CREATING AN ECONOMY THAT WORKS FOR ALL" "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few." "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives." "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, invest"| __truncated__ ...
$ cmp_code: chr "H" "503" "503" "405" ...
$ eu_code : logi NA NA NA NA NA NA ...
# The variable 'content' contains the text of the manifesto
Labour.2017 <- Labour.2017$content
Labour.2017[1:5]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"
[2] "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[3] "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[4] "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."
[5] "Each contributes and each must share fairly in the rewards."
# The headings in the manifesto are all-uppercase, this helps
# to identify them:
Labour.2017.hlno <- which(Labour.2017==toupper(Labour.2017))
Labour.2017.headings <- Labour.2017[Labour.2017.hlno]
Labour.2017.headings[1:4]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"
[2] "A FAIR TAXATION SYSTEM"
[3] "BALANCING THE BOOKS"
[4] "INFRASTRUCTURE INVESTMENT"
# All non-heading text is changed to lowercase
labour.2017 <- tolower(Labour.2017[-Labour.2017.hlno])
labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[3] "labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."
[4] "each contributes and each must share fairly in the rewards."
[5] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."
# All lines that contain the pattern 'econom' are collected
ecny.labour.2017 <- grep("econom",labour.2017,value=TRUE)
ecny.labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[3] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."
[4] "britain is the only major developed economy where earnings have fallen even as growth has returned after the financial crisis."
[5] "we will upgrade our economy, breaking down the barriers that hold too many of us back,"
# Using 'strsplit()' the lines are split into words
labour.2017.words <- strsplit(labour.2017,"[ ,.;:]+")
str(labour.2017.words[1:5])
List of 5
$ : chr [1:18] "labour's" "economic" "strategy" "is" ...
$ : chr [1:23] "we" "will" "measure" "our" ...
$ : chr [1:17] "labour" "understands" "that" "the" ...
$ : chr [1:10] "each" "contributes" "and" "each" ...
$ : chr [1:32] "this" "manifesto" "sets" "out" ...
# The result is a list. We change it into a character vector.
labour.2017.words <- unlist(labour.2017.words)
labour.2017.words[1:20]
[1] "labour's" "economic" "strategy" "is" "about"
[6] "delivering" "a" "fairer" "more" "prosperous"
[11] "society" "for" "the" "many" "not"
[16] "just" "the" "few" "we" "will"
# We now count the words and look at the 20 most common ones.
labour.2017.nwords <- table(labour.2017.words)
labour.2017.nwords <- sort(labour.2017.nwords,decreasing=TRUE)
labour.2017.nwords[1:20]
labour.2017.words
the and to will of a we in labour for our
1202 947 832 664 625 438 418 369 313 312 244
that on with by is are as have ensure
232 212 185 161 161 134 112 108 104
- R file: counting-words-in-a-manifesto.R
- Rmarkdown file: counting-words-in-a-manifesto.Rmd
- Jupyter notebook file: counting-words-in-a-manifesto.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):