Filling and completing with tidyr¶
The following makes use of the packages tidyr and readr. You may need to install them from CRAN using the code install.packages(c("tidyr","readr"))
if you want to run this on your computer.
(The packages are already installed
on the notebook container, however.)
Filling missing values with fill()
library(tidyr)
library(readr)
messy_data_str <- "
country, year,var1, var2
Rodinia, 1297, 67, -3.0
, 1298, 69, -2.9
, 1299, 70, -2.8
Pannotia, 1296, 73, -4.1
, 1297, 74, -3.9
, 1298, 75, -3.9
Pangaea, 1296, 54, -1.2
, 1297, 53, -1.1
, 1298, 52, -1.0
, 1299, 51, -0.9
"
messy_data_str %>% read_csv() -> messy_data
messy_data
# A tibble: 10 x 4
country year var1 var2
<chr> <dbl> <dbl> <dbl>
1 Rodinia 1297 67 -3
2 NA 1298 69 -2.9
3 NA 1299 70 -2.8
4 Pannotia 1296 73 -4.1
5 NA 1297 74 -3.9
6 NA 1298 75 -3.9
7 Pangaea 1296 54 -1.2
8 NA 1297 53 -1.1
9 NA 1298 52 -1
10 NA 1299 51 -0.9
messy_data %>% fill(country) -> filled_data
filled_data
# A tibble: 10 x 4
country year var1 var2
<chr> <dbl> <dbl> <dbl>
1 Rodinia 1297 67 -3
2 Rodinia 1298 69 -2.9
3 Rodinia 1299 70 -2.8
4 Pannotia 1296 73 -4.1
5 Pannotia 1297 74 -3.9
6 Pannotia 1298 75 -3.9
7 Pangaea 1296 54 -1.2
8 Pangaea 1297 53 -1.1
9 Pangaea 1298 52 -1
10 Pangaea 1299 51 -0.9
Completing data by missing values with complete()
filled_data %>% complete(crossing(country,year))
# A tibble: 12 x 4
country year var1 var2
<chr> <dbl> <dbl> <dbl>
1 Pangaea 1296 54 -1.2
2 Pangaea 1297 53 -1.1
3 Pangaea 1298 52 -1
4 Pangaea 1299 51 -0.9
5 Pannotia 1296 73 -4.1
6 Pannotia 1297 74 -3.9
7 Pannotia 1298 75 -3.9
8 Pannotia 1299 NA NA
9 Rodinia 1296 NA NA
10 Rodinia 1297 67 -3
11 Rodinia 1298 69 -2.9
12 Rodinia 1299 70 -2.8
- R file: filling-and-completing-with-tidyr.R
- Rmarkdown file: filling-and-completing-with-tidyr.Rmd
- Jupyter notebook file: filling-and-completing-with-tidyr.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):