Answer the following questions and complete the exercises in
RMarkdown. Please embed all of your code and push your final work to
your repository. Your final lab report should be organized, clean, and
run free from errors. Remember, you must remove the #
for
the included code chunks to run. Be sure to add your name to the author
header above. For any included plots, make sure they are clearly
labeled. You are free to use any plot type that you feel best
communicates the results of your analysis.
Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!
library(tidyverse)
library(janitor)
library(naniar)
options(scipen = 999)
The idea for this assignment came from Rebecca Barter’s ggplot tutorial so if you get stuck this is a good place to have a look.
For this assignment, we are going to use the dataset gapminder. Gapminder includes information about economics, population, and life expectancy from countries all over the world. You will need to install it before use.
#install.packages("gapminder")
library("gapminder")
gapminder <- gapminder
glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
miss_var_summary(gapminder)
## # A tibble: 6 × 3
## variable n_miss pct_miss
## <chr> <int> <num>
## 1 country 0 0
## 2 continent 0 0
## 3 year 0 0
## 4 lifeExp 0 0
## 5 pop 0 0
## 6 gdpPercap 0 0
gapminder %>%
group_by(continent) %>%
summarize(n=n_distinct(country))
## # A tibble: 5 × 2
## continent n
## <fct> <int>
## 1 Africa 52
## 2 Americas 25
## 3 Asia 33
## 4 Europe 30
## 5 Oceania 2
gapminder %>%
group_by(continent) %>%
summarize(n=n_distinct(country)) %>%
ggplot(aes(x=continent, y=n, fill=continent))+
geom_bar(stat="identity")
gapminder %>%
select(country, year, pop) %>%
filter(year==1952 | year==2007) %>%
pivot_wider(names_from = year,
names_prefix = "yr_",
values_from = pop) %>%
mutate(delta= yr_2007-yr_1952) %>%
arrange(desc(delta))
## # A tibble: 142 × 4
## country yr_1952 yr_2007 delta
## <fct> <int> <int> <int>
## 1 China 556263527 1318683096 762419569
## 2 India 372000000 1110396331 738396331
## 3 United States 157553000 301139947 143586947
## 4 Indonesia 82052000 223547000 141495000
## 5 Brazil 56602560 190010647 133408087
## 6 Pakistan 41346560 169270617 127924057
## 7 Bangladesh 46886859 150448339 103561480
## 8 Nigeria 33119096 135031164 101912068
## 9 Mexico 30144317 108700891 78556574
## 10 Philippines 22438691 91077287 68638596
## # ℹ 132 more rows
gapminder %>%
filter(country=="China") %>%
select(country, year, pop) %>%
ggplot(aes(x=as.factor(year), y=pop, group=country))+
geom_line()+
labs(title = "Population Growth in China",
x = "Year",
y = "Population")
gapminder %>%
group_by(year) %>%
summarize(min=min(lifeExp),
mean=mean(lifeExp),
max=max(lifeExp))
## # A tibble: 12 × 4
## year min mean max
## <int> <dbl> <dbl> <dbl>
## 1 1952 28.8 49.1 72.7
## 2 1957 30.3 51.5 73.5
## 3 1962 32.0 53.6 73.7
## 4 1967 34.0 55.7 74.2
## 5 1972 35.4 57.6 74.7
## 6 1977 31.2 59.6 76.1
## 7 1982 38.4 61.5 77.1
## 8 1987 39.9 63.2 78.7
## 9 1992 23.6 64.2 79.4
## 10 1997 36.1 65.0 80.7
## 11 2002 39.2 65.7 82
## 12 2007 39.6 67.0 82.6
gapminder %>%
group_by(year, continent) %>%
summarize(mean=mean(lifeExp), .groups = 'keep') %>%
ggplot(aes(x=as.factor(year), y=mean, group=continent, color=continent))+
geom_line()+
labs(title = "Life Expectancy by Continent",
x = "Year",
y = "Life Expectancy")
gapminder %>%
ggplot(aes(x=gdpPercap, y=lifeExp))+
geom_point()+
scale_x_log10()+
geom_smooth(method=lm, se=F)+
labs(title = "GDP vs. Life Expectancy",
x = "GDP per capita (log 10)",
y = "Life expectancy")
## `geom_smooth()` using formula = 'y ~ x'
gapminder %>%
select(country, year, gdpPercap) %>%
filter(year==1952 | year==2007) %>%
pivot_wider(names_from = year,
values_from = gdpPercap) %>%
mutate(delta= `2007`-`1952`) %>%
arrange(desc(delta))
## # A tibble: 142 × 4
## country `1952` `2007` delta
## <fct> <dbl> <dbl> <dbl>
## 1 Singapore 2315. 47143. 44828.
## 2 Norway 10095. 49357. 39262.
## 3 Hong Kong, China 3054. 39725. 36671.
## 4 Ireland 5210. 40676. 35466.
## 5 Austria 6137. 36126. 29989.
## 6 United States 13990. 42952. 28961.
## 7 Iceland 7268. 36181. 28913.
## 8 Japan 3217. 31656. 28439.
## 9 Netherlands 8942. 36798. 27856.
## 10 Taiwan 1207. 28718. 27511.
## # ℹ 132 more rows
gapminder %>%
filter(country=="Singapore" | country=="Norway" | country=="Hong Kong, China" | country=="Ireland" | country=="Austria") %>%
select(year, country, gdpPercap) %>%
ggplot(aes(x=as.factor(year), y=gdpPercap, group=country, color=country))+
geom_line()+
labs(title = "GDP per Capita Growth",
x = "Year",
y = "GDP per Capita")
Please knit your work as a .pdf or .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!