Learning Goals

At the end of this exercise, you will be able to:
1. Produce box plots using ggplot.
2. Customize labels on axes using labs and themes.
3. Use color and fill to customize plots and improve aesthetics.

Review

Now that you have been introduced to ggplot, let’s review the plot types from last week and learn how to manipulate their aesthetics to better suit our needs. Aesthetics make a significant visual difference, but you can take it too far so remember that the goal is to produce clean plots that are not distracting.

##Resources - ggplot2 cheatsheet - ggplot themes - Rebecca Barter ggplot Tutorial

Load the libraries

library("tidyverse")
library("janitor")

Load the data

Let’s revisit the mammal life history data to practice our ggplot skills. The data are from: S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.

life_history <- read_csv("data/mammal_lifehistories_v2.csv", na="-999") %>% clean_names()

Bar Plots

Bar plots count the number of observations in a categorical variable. What is the difference between geom_bar and geom_col?

Make two bar plots showing the number of observations for each order using each geom type.

geom_col

life_history %>% 
  count(order, sort=T) %>% 
  ggplot(aes(x=order, y=n))+
  geom_col()+
  coord_flip()

geom_bar

life_history %>% 
  ggplot(aes(x=order))+
  geom_bar()+
  coord_flip()

Remember that ggplot builds plots in layers. These layers can significantly improve the appearance of the plot. What if we wanted a bar plot of the mean mass for each order?

life_history %>% 
  group_by(order) %>% 
  summarize(mean_mass=mean(mass, na.rm=T)) %>% 
  ggplot(aes(x=order, y=mean_mass))+
  geom_col()+
  coord_flip() #flips the axes

There are a few problems here. First, the y-axis is in scientific notation. We can fix this by adjusting the options for the session.

options(scipen=999)#cancels scientific notation for the session

Next, the y-axis is not on a log scale. We can fix this by adding scale_y_log10().

life_history %>% 
  group_by(order) %>% 
  summarize(mean_mass=mean(mass, na.rm=T)) %>% 
  ggplot(aes(x=order, y=mean_mass))+
  geom_col()+
  coord_flip()+
  scale_y_log10()

Lastly, we can adjust the x-axis labels to make them more readable. We do this using reorder.

life_history %>% 
  group_by(order) %>% 
  summarize(mean_mass=mean(mass, na.rm=T)) %>% 
  ggplot(aes(x=reorder(order, mean_mass), y=mean_mass))+ #reorder the x-axis
  geom_col()+
  coord_flip()+
  scale_y_log10()

Scatterplots

Scatter plots allow for comparisons of two continuous variables. Make a scatterplot below that compares gestation time and weaning mass.

life_history %>% 
  ggplot(aes(x=gestation, y=wean_mass))+
  geom_jitter(na.rm=T)+
  scale_y_log10()

Boxplots

Box plots help us visualize a range of values. So, on the x-axis we typically have something categorical and the y-axis is the range. Let’s make a box plot that compares mass across orders.

life_history %>% 
  ggplot(aes(x=order, y=mass))+
  geom_boxplot(na.rm=T)+
  coord_flip()

life_history %>% 
  filter(order!="Cetacea") %>% 
  ggplot(aes(x=order, y=mass))+
  geom_boxplot(na.rm=T)+
  coord_flip()+
  scale_y_log10()

Aesthetics: Labels

Now that we have practiced scatter plots, bar plots, and box plots we need to learn how to adjust their appearance to suit our needs. Let’s start with labeling x and y axes.

For this exercise, let’s use the ElephantsMF data. These data are from Phyllis Lee, Stirling University, and are related to Lee, P., et al. (2013), “Enduring consequences of early experiences: 40-year effects on survival and success among African elephants (Loxodonta africana),” Biology Letters, 9: 20130011. kaggle.

elephants <- read_csv("data/elephantsMF.csv") %>% clean_names()
## Rows: 288 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Sex
## dbl (2): Age, Height
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Make a plot that compares age and height of elephants.

elephants %>% 
  ggplot(aes(x=age, y=height)) + 
  geom_point() + 
  geom_smooth(method=lm, se=F)
## `geom_smooth()` using formula = 'y ~ x'

The plot looks clean, but it is incomplete. A reader unfamiliar with the data might have a difficult time interpreting the labels. To add custom labels, we use the labs command.

elephants %>% 
  ggplot(aes(x=age, y=height)) + 
  geom_point() + 
  geom_smooth(method=lm, se=F)+
  labs(title="Elephant Age vs. Height",
       x="Age",
       y="Height")
## `geom_smooth()` using formula = 'y ~ x'

We can improve the plot further by adjusting the size and face of the text. We do this using theme(). The rel() option changes the relative size of the title to keep things consistent. Adding hjust allows control of title position.

elephants %>% 
  ggplot(aes(x=age, y=height)) + 
  geom_point() + 
  geom_smooth(method=lm, se=F)+
  labs(title="Elephant Age vs. Height",
       x="Age",
       y="Height")+
  theme(plot.title=element_text(size=rel(1.5), hjust=.5))
## `geom_smooth()` using formula = 'y ~ x'

Other Aesthetics

There are lots of options for aesthetics. An aesthetic can be assigned to either numeric or categorical data. fill is a common grouping option; notice that an appropriate key is displayed when you use one of these options.

elephants %>% 
  ggplot(aes(x=sex, fill=sex))+
  geom_bar()

size adjusts the size of points relative to a continuous variable.

life_history %>% 
  ggplot(aes(x=gestation, y=log10(mass), size=mass))+
  geom_point(na.rm=T)

That’s it! Let’s take a break and then move on to part 2!

–>Home