Learning Goals

At the end of this exercise, you will be able to:
1. Use mutate() to add columns in a dataframe.
2. Use mutate() and if_else() to replace values in a dataframe.

Load the libraries

library("tidyverse")
library("janitor")

Load the data

For this lab, we will use the following dataset:
S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402. link

Pipes %>%

Recall that we use pipes to connect the output of code to a subsequent function. This makes our code cleaner and more efficient. One way we can use pipes is to attach the clean_names() function from janitor to the read_csv() output.

mammals <- read_csv("data/mammal_lifehistories_v2.csv") %>% clean_names()
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mutate()

Recall that mutate allows us to create a new column from existing columns in a data frame. Use mutate() to make a new column that converts gestation to years. Which animal has the longest gestation period?

mammals %>% 
  select(genus, species, gestation) %>% 
  mutate(gestation_years = gestation/12) %>% 
  arrange(-gestation_years)
## # A tibble: 1,440 × 4
##    genus         species        gestation gestation_years
##    <chr>         <chr>              <dbl>           <dbl>
##  1 Loxodonta     africana            21.5            1.79
##  2 Elephas       maximus             21.1            1.76
##  3 Rhinoceros    sondaicus           16.5            1.38
##  4 Rhinoceros    unicornis           16.4            1.37
##  5 Diceros       bicornis            16.1            1.34
##  6 Ceratotherium simum               15.9            1.32
##  7 Physeter      catodon             15.8            1.32
##  8 Globicephala  macrorhynchus       15.2            1.27
##  9 Pseudorca     crassidens          14.9            1.24
## 10 Giraffa       camelopardalis      14.9            1.24
## # ℹ 1,430 more rows

mutate() and across()

This last function is super helpful when cleaning data. With “wild” data, there are often mixed entries (upper and lowercase), blank spaces, odd characters, etc. These all need to be dealt with before analysis.

Here is an example that changes all entries to lowercase (if present).

mammals
## # A tibble: 1,440 × 13
##    order  family genus species   mass gestation newborn weaning wean_mass    afr
##    <chr>  <chr>  <chr> <chr>    <dbl>     <dbl>   <dbl>   <dbl>     <dbl>  <dbl>
##  1 Artio… Antil… Anti… americ… 4.54e4      8.13   3246.    3         8900   13.5
##  2 Artio… Bovid… Addax nasoma… 1.82e5      9.39   5480     6.5       -999   27.3
##  3 Artio… Bovid… Aepy… melamp… 4.15e4      6.35   5093     5.63     15900   16.7
##  4 Artio… Bovid… Alce… busela… 1.5 e5      7.9   10167.    6.5       -999   23.0
##  5 Artio… Bovid… Ammo… clarkei 2.85e4      6.8    -999  -999         -999 -999  
##  6 Artio… Bovid… Ammo… lervia  5.55e4      5.08   3810     4         -999   14.9
##  7 Artio… Bovid… Anti… marsup… 3   e4      5.72   3910     4.04      -999   10.2
##  8 Artio… Bovid… Anti… cervic… 3.75e4      5.5    3846     2.13      -999   20.1
##  9 Artio… Bovid… Bison bison   4.98e5      8.93  20000    10.7     157500   29.4
## 10 Artio… Bovid… Bison bonasus 5   e5      9.14  23000.    6.6       -999   30.0
## # ℹ 1,430 more rows
## # ℹ 3 more variables: max_life <dbl>, litter_size <dbl>, litters_year <dbl>
mammals %>% 
  mutate(across(everything(), tolower))
## # A tibble: 1,440 × 13
##    order    family genus species mass  gestation newborn weaning wean_mass afr  
##    <chr>    <chr>  <chr> <chr>   <chr> <chr>     <chr>   <chr>   <chr>     <chr>
##  1 artioda… antil… anti… americ… 45375 8.13      3246.36 3       8900      13.53
##  2 artioda… bovid… addax nasoma… 1823… 9.39      5480    6.5     -999      27.27
##  3 artioda… bovid… aepy… melamp… 41480 6.35      5093    5.63    15900     16.66
##  4 artioda… bovid… alce… busela… 1500… 7.9       10166.… 6.5     -999      23.02
##  5 artioda… bovid… ammo… clarkei 28500 6.8       -999    -999    -999      -999 
##  6 artioda… bovid… ammo… lervia  55500 5.08      3810    4       -999      14.89
##  7 artioda… bovid… anti… marsup… 30000 5.72      3910    4.04    -999      10.23
##  8 artioda… bovid… anti… cervic… 37500 5.5       3846    2.13    -999      20.13
##  9 artioda… bovid… bison bison   4976… 8.93      20000   10.71   157500    29.45
## 10 artioda… bovid… bison bonasus 5e+05 9.14      23000.… 6.6     -999      29.99
## # ℹ 1,430 more rows
## # ℹ 3 more variables: max_life <chr>, litter_size <chr>, litters_year <chr>

Using the across function we can specify individual columns.

mammals %>% 
  mutate(across(c("order", "family"), tolower))
## # A tibble: 1,440 × 13
##    order  family genus species   mass gestation newborn weaning wean_mass    afr
##    <chr>  <chr>  <chr> <chr>    <dbl>     <dbl>   <dbl>   <dbl>     <dbl>  <dbl>
##  1 artio… antil… Anti… americ… 4.54e4      8.13   3246.    3         8900   13.5
##  2 artio… bovid… Addax nasoma… 1.82e5      9.39   5480     6.5       -999   27.3
##  3 artio… bovid… Aepy… melamp… 4.15e4      6.35   5093     5.63     15900   16.7
##  4 artio… bovid… Alce… busela… 1.5 e5      7.9   10167.    6.5       -999   23.0
##  5 artio… bovid… Ammo… clarkei 2.85e4      6.8    -999  -999         -999 -999  
##  6 artio… bovid… Ammo… lervia  5.55e4      5.08   3810     4         -999   14.9
##  7 artio… bovid… Anti… marsup… 3   e4      5.72   3910     4.04      -999   10.2
##  8 artio… bovid… Anti… cervic… 3.75e4      5.5    3846     2.13      -999   20.1
##  9 artio… bovid… Bison bison   4.98e5      8.93  20000    10.7     157500   29.4
## 10 artio… bovid… Bison bonasus 5   e5      9.14  23000.    6.6       -999   30.0
## # ℹ 1,430 more rows
## # ℹ 3 more variables: max_life <dbl>, litter_size <dbl>, litters_year <dbl>

if_else()

We will briefly introduce if_else() here because it allows us to use mutate() but not have the entire column affected in the same way. In a sense, this can function like find and replace in a spreadsheet program. With ifelse(), you first specify a logical statement, afterwards what needs to happen if the statement returns TRUE, and lastly what needs to happen if it’s FALSE.

Have a look at the data from mammals below. Notice that the values for newborn include -999.00. This is sometimes used as a placeholder for NA (but, is a really bad idea). We can use if_else() to replace -999.00 with NA.

mammals %>% 
  select(genus, species, newborn) %>% 
  arrange(newborn)
## # A tibble: 1,440 × 3
##    genus       species        newborn
##    <chr>       <chr>            <dbl>
##  1 Ammodorcas  clarkei           -999
##  2 Bos         javanicus         -999
##  3 Bubalus     depressicornis    -999
##  4 Bubalus     mindorensis       -999
##  5 Capra       falconeri         -999
##  6 Cephalophus niger             -999
##  7 Cephalophus nigrifrons        -999
##  8 Cephalophus natalensis        -999
##  9 Cephalophus leucogaster       -999
## 10 Cephalophus ogilbyi           -999
## # ℹ 1,430 more rows
mammals %>% 
  select(genus, species, newborn) %>%
  mutate(newborn_new = ifelse(newborn == -999.00, NA, newborn))%>% 
  arrange(newborn)
## # A tibble: 1,440 × 4
##    genus       species        newborn newborn_new
##    <chr>       <chr>            <dbl>       <dbl>
##  1 Ammodorcas  clarkei           -999          NA
##  2 Bos         javanicus         -999          NA
##  3 Bubalus     depressicornis    -999          NA
##  4 Bubalus     mindorensis       -999          NA
##  5 Capra       falconeri         -999          NA
##  6 Cephalophus niger             -999          NA
##  7 Cephalophus nigrifrons        -999          NA
##  8 Cephalophus natalensis        -999          NA
##  9 Cephalophus leucogaster       -999          NA
## 10 Cephalophus ogilbyi           -999          NA
## # ℹ 1,430 more rows

Practice

  1. We are interested in the family, genus, species and max life variables. Because the max life span for several mammals is unknown, the authors have use -999 in place of NA. Replace all of these values with NA in a new column titled max_life_new. Then convert max_life_new into years. Finally, sort the date in descending order by max_life_new. Which mammal has the longest life span?
mammals %>% 
  select(family, genus, species, max_life) %>% 
  mutate(max_life_new= ifelse(max_life==-999, NA, max_life)) %>% 
  mutate(max_life_new = max_life_new/12) %>%
  na.omit() %>%
  arrange(max_life)
## # A tibble: 599 × 5
##    family    genus    species        max_life max_life_new
##    <chr>     <chr>    <chr>             <dbl>        <dbl>
##  1 Muridae   Myopus   schisticolor         12         1   
##  2 Soricidae Sorex    longirostris         14         1.17
##  3 Muridae   Microtus longicaudus          14         1.17
##  4 Soricidae Myosorex varius               16         1.33
##  5 Muridae   Microtus pennsylvanicus       16         1.33
##  6 Soricidae Sorex    fumeus               17         1.42
##  7 Soricidae Sorex    arcticus             18         1.5 
##  8 Soricidae Sorex    ornatus              18         1.5 
##  9 Soricidae Sorex    monticolus           18         1.5 
## 10 Soricidae Sorex    trowbridgii          18         1.5 
## # ℹ 589 more rows
  1. Build a new data frame msleep24 from the msleep data that: contains the name and vore variables along with a new column called sleep_total_24 which is the amount of time a species sleeps expressed as a proportion of a 24-hour day. Restrict the sleep_total_24 values to less than or equal to 0.3. Arrange the output in descending order.
msleep24 <- msleep %>% 
  mutate(sleep_total_24=sleep_total/24) %>% 
  select(name, vore, sleep_total_24, sleep_total) %>% 
  filter(sleep_total_24<=0.3) %>% 
  arrange(desc(sleep_total_24))
msleep24
## # A tibble: 20 × 4
##    name                 vore  sleep_total_24 sleep_total
##    <chr>                <chr>          <dbl>       <dbl>
##  1 Vesper mouse         <NA>          0.292          7  
##  2 Gray hyrax           herbi         0.262          6.3
##  3 Genet                carni         0.262          6.3
##  4 Gray seal            carni         0.258          6.2
##  5 Common porpoise      carni         0.233          5.6
##  6 Rock hyrax           <NA>          0.225          5.4
##  7 Goat                 herbi         0.221          5.3
##  8 Tree hyrax           herbi         0.221          5.3
##  9 Bottle-nosed dolphin carni         0.217          5.2
## 10 Brazilian tapir      herbi         0.183          4.4
## 11 Cow                  herbi         0.167          4  
## 12 Asian elephant       herbi         0.162          3.9
## 13 Sheep                herbi         0.158          3.8
## 14 Caspian seal         carni         0.146          3.5
## 15 African elephant     herbi         0.137          3.3
## 16 Donkey               herbi         0.129          3.1
## 17 Roe deer             herbi         0.125          3  
## 18 Horse                herbi         0.121          2.9
## 19 Pilot whale          carni         0.112          2.7
## 20 Giraffe              herbi         0.0792         1.9

Did dplyr do what we expected? How do we check our output? Remember, just because your code runs it doesn’t mean that it did what you intended.

summary(msleep24)
##      name               vore           sleep_total_24     sleep_total   
##  Length:20          Length:20          Min.   :0.07917   Min.   :1.900  
##  Class :character   Class :character   1st Qu.:0.13542   1st Qu.:3.250  
##  Mode  :character   Mode  :character   Median :0.17500   Median :4.200  
##                                        Mean   :0.18563   Mean   :4.455  
##                                        3rd Qu.:0.22708   3rd Qu.:5.450  
##                                        Max.   :0.29167   Max.   :7.000

Histograms are also a quick way to check the output.

hist(msleep24$sleep_total)

That’s it! Let’s take a break and then move on to part 2!

–>Home