ggplot
part 3At the end of this exercise, you will be able to:
1. Produce box plots using ggplot.
2. Customize labels on axes using labs
and
themes
.
3. Use color
and fill
to customize plots and
improve aesthetics.
Now that you have been introduced to ggplot
, let’s
review the plot types from last week and learn how to manipulate their
aesthetics to better suit our needs. Aesthetics make a significant
visual difference, but you can take it too far so remember that the goal
is to produce clean plots that are not distracting.
##Resources - ggplot2
cheatsheet - ggplot
themes - Rebecca
Barter ggplot
Tutorial
library("tidyverse")
library("janitor")
Let’s revisit the mammal life history data to practice our ggplot skills. The data are from: S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.
life_history <- read_csv("data/mammal_lifehistories_v2.csv", na="-999") %>% clean_names()
Bar plots count the number of observations in a categorical variable.
What is the difference between geom_bar
and
geom_col
?
Make two bar plots showing the number of observations for each order using each geom type.
geom_col
life_history %>%
count(order, sort=T) %>%
ggplot(aes(x=order, y=n))+
geom_col()+
coord_flip()
geom_bar
life_history %>%
ggplot(aes(x=order))+
geom_bar()+
coord_flip()
Remember that ggplot builds plots in layers. These layers can significantly improve the appearance of the plot. What if we wanted a bar plot of the mean mass for each order?
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=order, y=mean_mass))+
geom_col()+
coord_flip() #flips the axes
There are a few problems here. First, the y-axis is in scientific notation. We can fix this by adjusting the options for the session.
options(scipen=999)#cancels scientific notation for the session
Next, the y-axis is not on a log scale. We can fix this by adding
scale_y_log10()
.
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=order, y=mean_mass))+
geom_col()+
coord_flip()+
scale_y_log10()
Lastly, we can adjust the x-axis labels to make them more readable.
We do this using reorder
.
life_history %>%
group_by(order) %>%
summarize(mean_mass=mean(mass, na.rm=T)) %>%
ggplot(aes(x=reorder(order, mean_mass), y=mean_mass))+ #reorder the x-axis
geom_col()+
coord_flip()+
scale_y_log10()
Scatter plots allow for comparisons of two continuous variables. Make a scatterplot below that compares gestation time and weaning mass.
life_history %>%
ggplot(aes(x=gestation, y=wean_mass))+
geom_jitter(na.rm=T)+
scale_y_log10()
Box plots help us visualize a range of values. So, on the x-axis we typically have something categorical and the y-axis is the range. Let’s make a box plot that compares mass across orders.
life_history %>%
ggplot(aes(x=order, y=mass))+
geom_boxplot(na.rm=T)+
coord_flip()
life_history %>%
filter(order!="Cetacea") %>%
ggplot(aes(x=order, y=mass))+
geom_boxplot(na.rm=T)+
coord_flip()+
scale_y_log10()
Now that we have practiced scatter plots, bar plots, and box plots we need to learn how to adjust their appearance to suit our needs. Let’s start with labeling x and y axes.
For this exercise, let’s use the ElephantsMF
data. These
data are from Phyllis Lee, Stirling University, and are related to Lee,
P., et al. (2013), “Enduring consequences of early experiences: 40-year
effects on survival and success among African elephants (Loxodonta
africana),” Biology Letters, 9: 20130011. kaggle.
elephants <- read_csv("data/elephantsMF.csv") %>% clean_names()
## Rows: 288 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Sex
## dbl (2): Age, Height
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Make a plot that compares age and height of elephants.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)
## `geom_smooth()` using formula = 'y ~ x'
The plot looks clean, but it is incomplete. A reader unfamiliar with
the data might have a difficult time interpreting the labels. To add
custom labels, we use the labs
command.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)+
labs(title="Elephant Age vs. Height",
x="Age",
y="Height")
## `geom_smooth()` using formula = 'y ~ x'
We can improve the plot further by adjusting the size and face of the
text. We do this using theme()
. The rel()
option changes the relative size of the title to keep things consistent.
Adding hjust
allows control of title position.
elephants %>%
ggplot(aes(x=age, y=height)) +
geom_point() +
geom_smooth(method=lm, se=F)+
labs(title="Elephant Age vs. Height",
x="Age",
y="Height")+
theme(plot.title=element_text(size=rel(1.5), hjust=.5))
## `geom_smooth()` using formula = 'y ~ x'
There are lots of options for aesthetics. An aesthetic can be
assigned to either numeric or categorical data. fill
is a
common grouping option; notice that an appropriate key is displayed when
you use one of these options.
elephants %>%
ggplot(aes(x=sex, fill=sex))+
geom_bar()
size
adjusts the size of points relative to a continuous
variable.
life_history %>%
ggplot(aes(x=gestation, y=log10(mass), size=mass))+
geom_point(na.rm=T)
–>Home