At the end of this exercise, you will be able to:
1. Build line plots, histograms, and density plots.
2. Adjust colors using R’s built-in color options.
3. Create new categories with case_when() and use those
categories to build plots.
library(tidyverse)
library(janitor)
options(scipen=999) #cancels the use of scientific notation for the session
For these exercises, we are going to use Gapminder data on population
(pop), GDP per capita (gdp), and life expectancy (lex). These data were
downloaded and compiled using the gapminder_compile.Rmd
script in the data folder.
gap <- read_csv("data/gapminder.csv") %>% clean_names()
Line plots are great when you need to show changes over time. Let’s see if we can find downturns in the US economy between 2000-2025.
Let’s start by making a clear x and y so we know what we are going to plot.
Histograms show the distribution of continuous variables. As
students, you have seen histograms of grade distributions. A histogram
bins the data and you specify the number of bins that
encompass a range of observations. For something like grades, this is
easy because the number of bins corresponds to the grades A-F. By
default, R uses a formula to calculate the number of bins but some
adjustment may be required.
What was the distribution of life expectancy in 1935?
Let’s rebuild the histogram, but this time we will specify the color and fill. Do a little experimentation on your own with the different colors.
Density plots are similar to histograms but they use a smoothing function to make the distribution more even and clean looking. They do not use bins.
I like to see both the histogram and the density curve so I often plot them together. Note that I assign the density plot a different color.
fill aesthetic to
differentiate the two years.case_when() is a very handy function from
dplyr which allows us to calculate a new variable from
other variables. We use case_when() within
mutate() to do this.case_when() allows us to
specify multiple conditions.
The approximate World Bank Income Categories are:
- Low income: $1,100 or less - Lower middle income: $1,100 to $4,500 -
Upper middle income: $4,500 to $13,900 - High income: $13,900 or
more
Let’s use case_when() to create a new column
income_category that categorizes countries into these
income categories based on their GDP per capita in 2000.
How many countries are in each of the World Bank Income Categories as of the year 2000?
Make a bar plot showing the number of countries in each income category in the year 2000.
–>Home