ggplot
part 3At the end of this exercise, you will be able to:
1. Produce box plots using ggplot.
2. Customize labels on axes using labs and
themes.
3. Use color and fill to customize plots and
improve aesthetics.
Now that you have been introduced to ggplot, let’s
review the plot types from last week and learn how to manipulate their
aesthetics to better suit our needs. Aesthetics make a significant
visual difference, but you can take it too far so remember that the goal
is to produce clean plots that are not distracting.
##Resources - ggplot2
cheatsheet - ggplot
themes - Rebecca
Barter ggplot Tutorial
library("tidyverse")
library("janitor")
Let’s revisit the mammal life history data to practice our ggplot skills. The data are from: S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.
life_history <- read_csv("data/mammal_lifehistories_v2.csv", na="-999") %>% clean_names()
Bar plots count the number of observations in a categorical variable.
What is the difference between geom_bar and
geom_col?
Make two bar plots showing the number of observations for each order using each geom type.
geom_col
life_history %>%
count(order, sort=T) %>%
ggplot(aes(x=order, y=n))+
geom_col()+
coord_flip()
geom_bar
life_history %>%
ggplot(aes(x=order))+
geom_bar()+
coord_flip()
Remember that ggplot builds plots in layers. These layers can significantly improve the appearance of the plot. What if we wanted a bar plot of the mean mass for each order?
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=order, y=mean_mass))+
geom_col()+
coord_flip() #flips the axes
There are a few problems here. First, the y-axis is in scientific notation. We can fix this by adjusting the options for the session.
options(scipen=999)#cancels scientific notation for the session
Next, the y-axis is not on a log scale. We can fix this by adding
scale_y_log10().
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=order, y=mean_mass))+
geom_col()+
coord_flip()+
scale_y_log10()
Lastly, we can adjust the x-axis labels to make them more readable.
We do this using reorder.
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=reorder(order, mean_mass), y=mean_mass))+ #reorder the x-axis
geom_col()+
coord_flip()+
scale_y_log10()
Scatter plots allow for comparisons of two continuous variables. Make a scatterplot below that compares gestation time and weaning mass.
life_history %>%
ggplot(aes(x=gestation, y=wean_mass))+
geom_jitter(na.rm=T)+
scale_y_log10()
Box plots help us visualize a range of values. So, on the x-axis we typically have something categorical and the y-axis is the range. Let’s make a box plot that compares mass across orders.
life_history %>%
ggplot(aes(x=order, y=mass))+
geom_boxplot(na.rm=T)+
coord_flip()
life_history %>%
filter(order!="Cetacea") %>%
ggplot(aes(x=order, y=mass))+
geom_boxplot(na.rm=T)+
coord_flip()+
scale_y_log10()
Now that we have practiced scatter plots, bar plots, and box plots we need to learn how to adjust their appearance to suit our needs. Let’s start with labeling x and y axes.
For this exercise, let’s use the ElephantsMF data. These
data are from Phyllis Lee, Stirling University, and are related to Lee,
P., et al. (2013), “Enduring consequences of early experiences: 40-year
effects on survival and success among African elephants (Loxodonta
africana),” Biology Letters, 9: 20130011. kaggle.
elephants <- read_csv("data/elephantsMF.csv") %>% clean_names()
## Rows: 288 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Sex
## dbl (2): Age, Height
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Make a plot that compares age and height of elephants.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)
## `geom_smooth()` using formula = 'y ~ x'
The plot looks clean, but it is incomplete. A reader unfamiliar with
the data might have a difficult time interpreting the labels. To add
custom labels, we use the labs command.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)+
labs(title="Elephant Age vs. Height",
x="Age",
y="Height")
## `geom_smooth()` using formula = 'y ~ x'
We can improve the plot further by adjusting the size and face of the
text. We do this using theme(). The rel()
option changes the relative size of the title to keep things consistent.
Adding hjust allows control of title position.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)+
labs(title="Elephant Age vs. Height",
x="Age",
y="Height")+
theme(plot.title=element_text(size=rel(1.5), hjust=.5))
## `geom_smooth()` using formula = 'y ~ x'
There are lots of options for aesthetics. An aesthetic can be
assigned to either numeric or categorical data. fill is a
common grouping option; notice that an appropriate key is displayed when
you use one of these options.
elephants %>%
ggplot(aes(x=sex, fill=sex))+
geom_bar()
size adjusts the size of points relative to a continuous
variable.
life_history %>%
ggplot(aes(x=gestation, y=log10(mass), size=mass))+
geom_point(na.rm=T)
–>Home