filter()
on multiple
conditionsAt the end of this exercise, you will be able to:
1. Use filter()
to extract variables of interest.
2. Use filter()
to extract variables of interest under
multiple conditions.
library("tidyverse")
library("janitor")
For this lab, we will use the following two datasets:
fish <- read_csv("data/Gaeta_etal_CLC_data.csv")
## Rows: 4033 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): lakeid, annnumber
## dbl (4): fish_id, length, radii_length_mm, scalelength
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s rename some of the mammals
variables.
mammals <- rename(mammals, genus="Genus", wean_mass="wean mass", max_life= "max. life", litter_size="litter size", litters_per_year="litters/year")
Do you remember the easier way?
mammals <- clean_names(mammals)
In lab 4, we practiced extracting observations of interest. For example, we can pull out all of the fish from a specific lake.
glimpse(fish)
## Rows: 4,033
## Columns: 6
## $ lakeid <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", …
## $ fish_id <dbl> 299, 299, 299, 300, 300, 300, 300, 301, 301, 301, 301,…
## $ annnumber <chr> "EDGE", "2", "1", "EDGE", "3", "2", "1", "EDGE", "3", …
## $ length <dbl> 167, 167, 167, 175, 175, 175, 175, 194, 194, 194, 194,…
## $ radii_length_mm <dbl> 2.697443, 2.037518, 1.311795, 3.015477, 2.670733, 2.13…
## $ scalelength <dbl> 2.697443, 2.697443, 2.697443, 3.015477, 3.015477, 3.01…
Let’s convert lakeid
to a factor and have a look at
which states are represented in the data.
fish$lakeid <- as.factor(fish$lakeid)
table(fish$lakeid)
##
## AL AR BO BR CR DY FD JN LC LJ LR LSG MN RD UB WS
## 383 262 197 291 343 355 302 238 173 181 292 143 293 135 191 254
Now we can pull out any state of interest.
filter(fish, lakeid=="LJ")
## # A tibble: 181 × 6
## lakeid fish_id annnumber length radii_length_mm scalelength
## <fct> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 LJ 239 EDGE 245 4.10 4.10
## 2 LJ 239 8 245 3.87 4.10
## 3 LJ 239 7 245 3.63 4.10
## 4 LJ 239 6 245 3.42 4.10
## 5 LJ 239 5 245 3.19 4.10
## 6 LJ 239 4 245 2.84 4.10
## 7 LJ 239 3 245 2.50 4.10
## 8 LJ 239 2 245 2.12 4.10
## 9 LJ 239 1 245 1.33 4.10
## 10 LJ 240 EDGE 249 4.40 4.40
## # ℹ 171 more rows
filter()
on multiple conditionsYou can also use filter()
to extract data based on
multiple conditions. Below we extract only the fish that have lakeid
“AL” and length >350.
filter(fish, lakeid == "AL" & length > 350)
## # A tibble: 314 × 6
## lakeid fish_id annnumber length radii_length_mm scalelength
## <fct> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 AL 307 EDGE 353 7.55 7.55
## 2 AL 307 13 353 7.28 7.55
## 3 AL 307 12 353 6.98 7.55
## 4 AL 307 11 353 6.73 7.55
## 5 AL 307 10 353 6.48 7.55
## 6 AL 307 9 353 6.22 7.55
## 7 AL 307 8 353 5.92 7.55
## 8 AL 307 7 353 5.44 7.55
## 9 AL 307 6 353 5.06 7.55
## 10 AL 307 5 353 4.37 7.55
## # ℹ 304 more rows
Notice that the |
operator generates a different result.
Why?
filter(fish, lakeid == "AL" | length > 350)
## # A tibble: 948 × 6
## lakeid fish_id annnumber length radii_length_mm scalelength
## <fct> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 AL 299 EDGE 167 2.70 2.70
## 2 AL 299 2 167 2.04 2.70
## 3 AL 299 1 167 1.31 2.70
## 4 AL 300 EDGE 175 3.02 3.02
## 5 AL 300 3 175 2.67 3.02
## 6 AL 300 2 175 2.14 3.02
## 7 AL 300 1 175 1.23 3.02
## 8 AL 301 EDGE 194 3.34 3.34
## 9 AL 301 3 194 2.97 3.34
## 10 AL 301 2 194 2.29 3.34
## # ℹ 938 more rows
Rules:
+ filter(condition1, condition2)
will return rows where
both conditions are met. By default the , means &.
+ filter(condition1, !condition2)
will return all rows
where condition one is true but condition 2 is not.
+ filter(condition1 | condition2)
will return rows where
condition 1 or condition 2 is met.
+ filter(xor(condition1, condition2)
will return all rows
where only one of the conditions is met, and not when both conditions
are met.
In this case, we filter out the fish with a length over 400 and a scale length over 11 or a radii length over 8.
filter(fish, length > 400, (scalelength > 11 | radii_length_mm > 8))
## # A tibble: 23 × 6
## lakeid fish_id annnumber length radii_length_mm scalelength
## <fct> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 AL 324 EDGE 406 8.21 8.21
## 2 AL 327 EDGE 413 8.33 8.33
## 3 AL 327 15 413 8.11 8.33
## 4 AL 328 EDGE 420 8.71 8.71
## 5 AL 328 16 420 8.41 8.71
## 6 AL 328 15 420 8.14 8.71
## 7 WS 180 EDGE 403 11.0 11.0
## 8 WS 180 16 403 10.6 11.0
## 9 WS 180 15 403 10.3 11.0
## 10 WS 180 14 403 9.93 11.0
## # ℹ 13 more rows
mammals
data, filter all members of the family
Bovidae with a mass greater than 450000.bovidae <- filter(mammals, mass>450000 )
bovidae
## # A tibble: 47 × 13
## order family genus species mass gestation newborn weaning wean_mass afr
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Artiod… Bovid… Bison bison 4.98e5 8.93 20000 10.7 157500 29.4
## 2 Artiod… Bovid… Bison bonasus 5 e5 9.14 23000. 6.6 -999 30.0
## 3 Artiod… Bovid… Bos fronta… 8 e5 9.02 23033. 4.5 -999 24.2
## 4 Artiod… Bovid… Bos javani… 6.67e5 9.83 -999 9.5 -999 25.5
## 5 Artiod… Bovid… Buba… bubalis 9.5 e5 10.5 37500 7.5 -999 19.9
## 6 Artiod… Bovid… Sync… caffer 5.05e5 11.0 42862. 9.18 166000 47.9
## 7 Artiod… Bovid… Taur… derbia… 6.80e5 8.67 -999 -999 -999 36.4
## 8 Artiod… Giraf… Gira… camelo… 8 e5 14.9 59771. 8.25 -999 48.7
## 9 Artiod… Hippo… Hipp… amphib… 1.26e6 7.75 39747. 10.1 237500 89.9
## 10 Carniv… Odobe… Odob… rosmar… 6.5 e5 11.5 51883. 20.4 200000 52.1
## # ℹ 37 more rows
## # ℹ 3 more variables: max_life <dbl>, litter_size <dbl>, litters_per_year <dbl>
mammals
data, build a data frame that compares
mass
, gestation
, and newborn
among the primate genera Lophocebus
,
Erythrocebus
, and Macaca
. Among these genera,
which species has the smallest newborn
mass?mammals2 <- select(mammals, "order", "genus", "mass", "gestation", "newborn")
mammals3 <- filter(mammals2, order=="Primates")
mammals4 <- filter(mammals3, genus %in% c("Lophocebus", "Erythrocebus", "Macaca"))
mammals %>%
select(genus, species, mass, gestation, newborn) %>%
filter(genus %in% c("Lophocebus", "Erythrocebus", "Macaca")) %>%
arrange(newborn)
## # A tibble: 15 × 5
## genus species mass gestation newborn
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Macaca maura 5575 5.43 390.
## 2 Macaca radiata 3735 5.43 391.
## 3 Macaca cyclopis 6317. 5.4 401
## 4 Macaca fascicularis 3456. 5.49 408.
## 5 Macaca silenus 4875 5.94 418
## 6 Macaca sinica 3495 -999 446
## 7 Macaca sylvanus 9753. 5.49 450
## 8 Macaca nigra 6212. 5.78 458.
## 9 Lophocebus albigena 6726. 5.97 462.
## 10 Macaca mulatta 5413. 5.47 476.
## 11 Macaca nemestrina 6133. 5.71 476.
## 12 Macaca arctoides 7308. 6 486.
## 13 Macaca fuscata 8858. 5.72 505.
## 14 Macaca thibetana 10037. 5.67 533.
## 15 Erythrocebus patas 5883. 5.56 546.
This is crazy, we need away to connect the commands…
–>Home