Instructions

Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code and push your final work to your repository. Your final lab report should be organized, clean, and run free from errors. Remember, you must remove the # for the included code chunks to run. Be sure to add your name to the author header above. For any included plots, make sure they are clearly labeled. You are free to use any plot type that you feel best communicates the results of your analysis.

Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!

Load the libraries

library(tidyverse)
library(janitor)
library(naniar)
options(scipen = 999)

Resources

The idea for this assignment came from Rebecca Barter’s ggplot tutorial so if you get stuck this is a good place to have a look.

Gapminder

For this assignment, we are going to use the dataset gapminder. Gapminder includes information about economics, population, and life expectancy from countries all over the world. You will need to install it before use.

#install.packages("gapminder")
library("gapminder")
  1. Use the function(s) of your choice to get an idea of the overall structure of the data frame, including its dimensions, column names, variable classes, etc. As part of this, determine how NA’s are treated in the data.
gapminder <- gapminder
glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
miss_var_summary(gapminder)
## # A tibble: 6 × 3
##   variable  n_miss pct_miss
##   <chr>      <int>    <num>
## 1 country        0        0
## 2 continent      0        0
## 3 year           0        0
## 4 lifeExp        0        0
## 5 pop            0        0
## 6 gdpPercap      0        0
  1. How many countries are represented in this dataset? Make a table and a plot that shows the number of countries by continent.
gapminder %>%
  group_by(continent) %>%
  summarize(n=n_distinct(country))
## # A tibble: 5 × 2
##   continent     n
##   <fct>     <int>
## 1 Africa       52
## 2 Americas     25
## 3 Asia         33
## 4 Europe       30
## 5 Oceania       2
gapminder %>%
  group_by(continent) %>%
  summarize(n=n_distinct(country)) %>%
  ggplot(aes(x=continent, y=n, fill=continent))+
  geom_bar(stat="identity")

  1. Which country has the largest population growth since 1952? Show this as a table.
gapminder %>% 
  select(country, year, pop) %>% 
  filter(year==1952 | year==2007) %>% 
  pivot_wider(names_from = year,
              names_prefix = "yr_",
              values_from = pop) %>% 
  mutate(delta= yr_2007-yr_1952) %>% 
  arrange(desc(delta))
## # A tibble: 142 × 4
##    country         yr_1952    yr_2007     delta
##    <fct>             <int>      <int>     <int>
##  1 China         556263527 1318683096 762419569
##  2 India         372000000 1110396331 738396331
##  3 United States 157553000  301139947 143586947
##  4 Indonesia      82052000  223547000 141495000
##  5 Brazil         56602560  190010647 133408087
##  6 Pakistan       41346560  169270617 127924057
##  7 Bangladesh     46886859  150448339 103561480
##  8 Nigeria        33119096  135031164 101912068
##  9 Mexico         30144317  108700891  78556574
## 10 Philippines    22438691   91077287  68638596
## # ℹ 132 more rows
  1. Make a plot that shows population growth for the country you found in question #3. This plot should show the change over time.
gapminder %>% 
  filter(country=="China") %>% 
  select(country, year, pop) %>% 
  ggplot(aes(x=as.factor(year), y=pop, group=country))+
  geom_line()+
  labs(title = "Population Growth in China",
       x = "Year",
       y = "Population")

  1. How has global life expectancy changed between 1952 and 2007? Show the min, mean, and max for all countries in the dataset. Show this as a table.
gapminder %>% 
  group_by(year) %>% 
  summarize(min=min(lifeExp),
            mean=mean(lifeExp),
            max=max(lifeExp))
## # A tibble: 12 × 4
##     year   min  mean   max
##    <int> <dbl> <dbl> <dbl>
##  1  1952  28.8  49.1  72.7
##  2  1957  30.3  51.5  73.5
##  3  1962  32.0  53.6  73.7
##  4  1967  34.0  55.7  74.2
##  5  1972  35.4  57.6  74.7
##  6  1977  31.2  59.6  76.1
##  7  1982  38.4  61.5  77.1
##  8  1987  39.9  63.2  78.7
##  9  1992  23.6  64.2  79.4
## 10  1997  36.1  65.0  80.7
## 11  2002  39.2  65.7  82  
## 12  2007  39.6  67.0  82.6
  1. Make a plot that shows how mean life expectancy has changed over time for each continent. What is your interpretation of what happened in Africa between 1987 and 2002?
gapminder %>% 
  group_by(year, continent) %>% 
  summarize(mean=mean(lifeExp), .groups = 'keep') %>% 
  ggplot(aes(x=as.factor(year), y=mean, group=continent, color=continent))+
  geom_line()+
  labs(title = "Life Expectancy by Continent",
       x = "Year",
       y = "Life Expectancy")

  1. We are interested in the relationship between per capita GDP and life expectancy; i.e. does having more money help you live longer? Show this as a plot.
gapminder %>%
  ggplot(aes(x=gdpPercap, y=lifeExp))+
  geom_point()+
  scale_x_log10()+
  geom_smooth(method=lm, se=F)+
  labs(title = "GDP vs. Life Expectancy",
       x = "GDP per capita (log 10)",
       y = "Life expectancy")
## `geom_smooth()` using formula = 'y ~ x'

  1. Which five countries have had the highest GDP per capita growth over the years represented in this dataset? Show this as a table.
gapminder %>% 
  select(country, year, gdpPercap) %>% 
  filter(year==1952 | year==2007) %>% 
  pivot_wider(names_from = year,
              values_from = gdpPercap) %>% 
  mutate(delta= `2007`-`1952`) %>% 
  arrange(desc(delta))
## # A tibble: 142 × 4
##    country          `1952` `2007`  delta
##    <fct>             <dbl>  <dbl>  <dbl>
##  1 Singapore         2315. 47143. 44828.
##  2 Norway           10095. 49357. 39262.
##  3 Hong Kong, China  3054. 39725. 36671.
##  4 Ireland           5210. 40676. 35466.
##  5 Austria           6137. 36126. 29989.
##  6 United States    13990. 42952. 28961.
##  7 Iceland           7268. 36181. 28913.
##  8 Japan             3217. 31656. 28439.
##  9 Netherlands       8942. 36798. 27856.
## 10 Taiwan            1207. 28718. 27511.
## # ℹ 132 more rows
  1. How does per capita GDP growth compare between these same five countries? Show this as a plot.
gapminder %>% 
  filter(country=="Singapore" | country=="Norway" | country=="Hong Kong, China" | country=="Ireland" | country=="Austria") %>% 
  select(year, country, gdpPercap) %>% 
  ggplot(aes(x=as.factor(year), y=gdpPercap, group=country, color=country))+
  geom_line()+
  labs(title = "GDP per Capita Growth",
       x = "Year",
       y = "GDP per Capita")

  1. Do one analysis of your choice that includes a table and plot as outputs.

Knit and Upload

Please knit your work as a .pdf or .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!