Learning Goals

At the end of this exercise, you will be able to:
1. Use filter() to extract variables of interest.
2. Use filter() to extract variables of interest under multiple conditions.

Load the tidyverse

library("tidyverse")
library("janitor")

Load the data

For this lab, we will use the following two datasets:

  1. Gaeta J., G. Sass, S. Carpenter. 2012. Biocomplexity at North Temperate Lakes LTER: Coordinated Field Studies: Large Mouth Bass Growth 2006. Environmental Data Initiative. link
fish <- read_csv("data/Gaeta_etal_CLC_data.csv")
## Rows: 4033 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): lakeid, annnumber
## dbl (4): fish_id, length, radii_length_mm, scalelength
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402. link
mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s rename some of the mammals variables.

mammals <- rename(mammals, genus="Genus", wean_mass="wean mass", max_life= "max. life", litter_size="litter size", litters_per_year="litters/year")

Do you remember the easier way?

mammals <- clean_names(mammals)

Review

In lab 4, we practiced extracting observations of interest. For example, we can pull out all of the fish from a specific lake.

glimpse(fish)
## Rows: 4,033
## Columns: 6
## $ lakeid          <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", …
## $ fish_id         <dbl> 299, 299, 299, 300, 300, 300, 300, 301, 301, 301, 301,…
## $ annnumber       <chr> "EDGE", "2", "1", "EDGE", "3", "2", "1", "EDGE", "3", …
## $ length          <dbl> 167, 167, 167, 175, 175, 175, 175, 194, 194, 194, 194,…
## $ radii_length_mm <dbl> 2.697443, 2.037518, 1.311795, 3.015477, 2.670733, 2.13…
## $ scalelength     <dbl> 2.697443, 2.697443, 2.697443, 3.015477, 3.015477, 3.01…

Let’s convert lakeid to a factor and have a look at which states are represented in the data.

fish$lakeid <- as.factor(fish$lakeid)
table(fish$lakeid)
## 
##  AL  AR  BO  BR  CR  DY  FD  JN  LC  LJ  LR LSG  MN  RD  UB  WS 
## 383 262 197 291 343 355 302 238 173 181 292 143 293 135 191 254

Now we can pull out any state of interest.

filter(fish, lakeid=="LJ")
## # A tibble: 181 × 6
##    lakeid fish_id annnumber length radii_length_mm scalelength
##    <fct>    <dbl> <chr>      <dbl>           <dbl>       <dbl>
##  1 LJ         239 EDGE         245            4.10        4.10
##  2 LJ         239 8            245            3.87        4.10
##  3 LJ         239 7            245            3.63        4.10
##  4 LJ         239 6            245            3.42        4.10
##  5 LJ         239 5            245            3.19        4.10
##  6 LJ         239 4            245            2.84        4.10
##  7 LJ         239 3            245            2.50        4.10
##  8 LJ         239 2            245            2.12        4.10
##  9 LJ         239 1            245            1.33        4.10
## 10 LJ         240 EDGE         249            4.40        4.40
## # ℹ 171 more rows

Using filter() on multiple conditions

You can also use filter() to extract data based on multiple conditions. Below we extract only the fish that have lakeid “AL” and length >350.

filter(fish, lakeid == "AL" & length > 350)
## # A tibble: 314 × 6
##    lakeid fish_id annnumber length radii_length_mm scalelength
##    <fct>    <dbl> <chr>      <dbl>           <dbl>       <dbl>
##  1 AL         307 EDGE         353            7.55        7.55
##  2 AL         307 13           353            7.28        7.55
##  3 AL         307 12           353            6.98        7.55
##  4 AL         307 11           353            6.73        7.55
##  5 AL         307 10           353            6.48        7.55
##  6 AL         307 9            353            6.22        7.55
##  7 AL         307 8            353            5.92        7.55
##  8 AL         307 7            353            5.44        7.55
##  9 AL         307 6            353            5.06        7.55
## 10 AL         307 5            353            4.37        7.55
## # ℹ 304 more rows

Notice that the | operator generates a different result. Why?

filter(fish, lakeid == "AL" | length > 350)
## # A tibble: 948 × 6
##    lakeid fish_id annnumber length radii_length_mm scalelength
##    <fct>    <dbl> <chr>      <dbl>           <dbl>       <dbl>
##  1 AL         299 EDGE         167            2.70        2.70
##  2 AL         299 2            167            2.04        2.70
##  3 AL         299 1            167            1.31        2.70
##  4 AL         300 EDGE         175            3.02        3.02
##  5 AL         300 3            175            2.67        3.02
##  6 AL         300 2            175            2.14        3.02
##  7 AL         300 1            175            1.23        3.02
##  8 AL         301 EDGE         194            3.34        3.34
##  9 AL         301 3            194            2.97        3.34
## 10 AL         301 2            194            2.29        3.34
## # ℹ 938 more rows

Rules:
+ filter(condition1, condition2) will return rows where both conditions are met. By default the , means &.
+ filter(condition1, !condition2) will return all rows where condition one is true but condition 2 is not.
+ filter(condition1 | condition2) will return rows where condition 1 or condition 2 is met.
+ filter(xor(condition1, condition2) will return all rows where only one of the conditions is met, and not when both conditions are met.

In this case, we filter out the fish with a length over 400 and a scale length over 11 or a radii length over 8.

filter(fish, length > 400, (scalelength > 11 | radii_length_mm > 8))
## # A tibble: 23 × 6
##    lakeid fish_id annnumber length radii_length_mm scalelength
##    <fct>    <dbl> <chr>      <dbl>           <dbl>       <dbl>
##  1 AL         324 EDGE         406            8.21        8.21
##  2 AL         327 EDGE         413            8.33        8.33
##  3 AL         327 15           413            8.11        8.33
##  4 AL         328 EDGE         420            8.71        8.71
##  5 AL         328 16           420            8.41        8.71
##  6 AL         328 15           420            8.14        8.71
##  7 WS         180 EDGE         403           11.0        11.0 
##  8 WS         180 16           403           10.6        11.0 
##  9 WS         180 15           403           10.3        11.0 
## 10 WS         180 14           403            9.93       11.0 
## # ℹ 13 more rows

Practice

  1. From the mammals data, filter all members of the family Bovidae with a mass greater than 450000.
bovidae <- filter(mammals, mass>450000 )
bovidae
## # A tibble: 47 × 13
##    order   family genus species   mass gestation newborn weaning wean_mass   afr
##    <chr>   <chr>  <chr> <chr>    <dbl>     <dbl>   <dbl>   <dbl>     <dbl> <dbl>
##  1 Artiod… Bovid… Bison bison   4.98e5      8.93  20000    10.7     157500  29.4
##  2 Artiod… Bovid… Bison bonasus 5   e5      9.14  23000.    6.6       -999  30.0
##  3 Artiod… Bovid… Bos   fronta… 8   e5      9.02  23033.    4.5       -999  24.2
##  4 Artiod… Bovid… Bos   javani… 6.67e5      9.83   -999     9.5       -999  25.5
##  5 Artiod… Bovid… Buba… bubalis 9.5 e5     10.5   37500     7.5       -999  19.9
##  6 Artiod… Bovid… Sync… caffer  5.05e5     11.0   42862.    9.18    166000  47.9
##  7 Artiod… Bovid… Taur… derbia… 6.80e5      8.67   -999  -999         -999  36.4
##  8 Artiod… Giraf… Gira… camelo… 8   e5     14.9   59771.    8.25      -999  48.7
##  9 Artiod… Hippo… Hipp… amphib… 1.26e6      7.75  39747.   10.1     237500  89.9
## 10 Carniv… Odobe… Odob… rosmar… 6.5 e5     11.5   51883.   20.4     200000  52.1
## # ℹ 37 more rows
## # ℹ 3 more variables: max_life <dbl>, litter_size <dbl>, litters_per_year <dbl>
  1. From the mammals data, build a data frame that compares mass, gestation, and newborn among the primate genera Lophocebus, Erythrocebus, and Macaca. Among these genera, which species has the smallest newborn mass?
mammals2 <- select(mammals, "order", "genus", "mass", "gestation", "newborn")
mammals3 <- filter(mammals2, order=="Primates")
mammals4 <- filter(mammals3, genus %in% c("Lophocebus", "Erythrocebus", "Macaca"))
mammals %>% 
  select(genus, species, mass, gestation, newborn) %>% 
  filter(genus %in% c("Lophocebus", "Erythrocebus", "Macaca")) %>% 
  arrange(newborn)
## # A tibble: 15 × 5
##    genus        species        mass gestation newborn
##    <chr>        <chr>         <dbl>     <dbl>   <dbl>
##  1 Macaca       maura         5575       5.43    390.
##  2 Macaca       radiata       3735       5.43    391.
##  3 Macaca       cyclopis      6317.      5.4     401 
##  4 Macaca       fascicularis  3456.      5.49    408.
##  5 Macaca       silenus       4875       5.94    418 
##  6 Macaca       sinica        3495    -999       446 
##  7 Macaca       sylvanus      9753.      5.49    450 
##  8 Macaca       nigra         6212.      5.78    458.
##  9 Lophocebus   albigena      6726.      5.97    462.
## 10 Macaca       mulatta       5413.      5.47    476.
## 11 Macaca       nemestrina    6133.      5.71    476.
## 12 Macaca       arctoides     7308.      6       486.
## 13 Macaca       fuscata       8858.      5.72    505.
## 14 Macaca       thibetana    10037.      5.67    533.
## 15 Erythrocebus patas         5883.      5.56    546.

This is crazy, we need away to connect the commands…

That’s it! Let’s take a break and then move on to part 2!

–>Home