arrange()
At the end of this exercise, you will be able to:
1. Use pipes to connect functions in dplyr.
2. Use arrange()
to order dplyr outputs.
3. Use mutate()
to add columns in a dataframe.
At this point, you should be comfortable using the
select()
and filter()
functions of
dplyr
. If you need help, please email me.
library("tidyverse")
library("janitor")
For this lab, we will use the following two datasets:
fish <- read_csv("data/Gaeta_etal_CLC_data.csv")
## Rows: 4033 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): lakeid, annnumber
## dbl (4): fish_id, length, radii_length_mm, scalelength
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rename the mammals
variables using janitor.
mammals <- clean_names(mammals)
%>%
In order to start combining select()
,
filter()
, and other functions efficiently, we need to learn
pipes. Pipes feed the output from one function into the input of another
function. This helps us keep our code sequential and clean.
In this example, we use a pipe to select only lakeid
and
scalelength
then filter that output only for lakes “AL”.
Notice that we only need to call the data one time.
select(fish, lakeid, scalelength)
## # A tibble: 4,033 × 2
## lakeid scalelength
## <chr> <dbl>
## 1 AL 2.70
## 2 AL 2.70
## 3 AL 2.70
## 4 AL 3.02
## 5 AL 3.02
## 6 AL 3.02
## 7 AL 3.02
## 8 AL 3.34
## 9 AL 3.34
## 10 AL 3.34
## # ℹ 4,023 more rows
filter(fish, lakeid == "AL")
## # A tibble: 383 × 6
## lakeid fish_id annnumber length radii_length_mm scalelength
## <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 AL 299 EDGE 167 2.70 2.70
## 2 AL 299 2 167 2.04 2.70
## 3 AL 299 1 167 1.31 2.70
## 4 AL 300 EDGE 175 3.02 3.02
## 5 AL 300 3 175 2.67 3.02
## 6 AL 300 2 175 2.14 3.02
## 7 AL 300 1 175 1.23 3.02
## 8 AL 301 EDGE 194 3.34 3.34
## 9 AL 301 3 194 2.97 3.34
## 10 AL 301 2 194 2.29 3.34
## # ℹ 373 more rows
fish %>%
select(lakeid, scalelength) %>%
filter(lakeid == "AL")
## # A tibble: 383 × 2
## lakeid scalelength
## <chr> <dbl>
## 1 AL 2.70
## 2 AL 2.70
## 3 AL 2.70
## 4 AL 3.02
## 5 AL 3.02
## 6 AL 3.02
## 7 AL 3.02
## 8 AL 3.34
## 9 AL 3.34
## 10 AL 3.34
## # ℹ 373 more rows
Here we select family, genus, and species then filter for gestation greater than 15 months.
mammals %>%
select("family", "genus", "species", "gestation") %>%
filter(gestation>=15)
## # A tibble: 8 × 4
## family genus species gestation
## <chr> <chr> <chr> <dbl>
## 1 Delphinidae Globicephala macrorhynchus 15.2
## 2 Physeteridae Physeter catodon 15.8
## 3 Rhinocerotidae Ceratotherium simum 15.9
## 4 Rhinocerotidae Diceros bicornis 16.1
## 5 Rhinocerotidae Rhinoceros unicornis 16.4
## 6 Rhinocerotidae Rhinoceros sondaicus 16.5
## 7 Elephantidae Elephas maximus 21.1
## 8 Elephantidae Loxodonta africana 21.5
fish
data. Use pipes!fish %>% # work with the fish data
select(lakeid, radii_length_mm) %>% #pull out variables of int
filter(lakeid=="AL" | lakeid=="AR") %>% #only these lakes
filter(between(radii_length_mm, 2, 4)) %>% #between 2 and 4
arrange(desc(radii_length_mm)) #sort to make easier to read
## # A tibble: 253 × 2
## lakeid radii_length_mm
## <chr> <dbl>
## 1 AR 4.00
## 2 AR 3.99
## 3 AR 3.99
## 4 AR 3.98
## 5 AR 3.98
## 6 AR 3.96
## 7 AR 3.94
## 8 AR 3.94
## 9 AR 3.94
## 10 AL 3.94
## # ℹ 243 more rows
arrange()
The arrange()
command is a bit like a sort command in
excel. Note that the default is ascending order.
fish %>%
select(lakeid, scalelength) %>%
arrange(scalelength)
## # A tibble: 4,033 × 2
## lakeid scalelength
## <chr> <dbl>
## 1 WS 0.628
## 2 WS 0.637
## 3 LSG 0.773
## 4 WS 0.832
## 5 LSG 1.15
## 6 WS 1.41
## 7 WS 1.41
## 8 WS 1.41
## 9 BO 1.43
## 10 BO 1.43
## # ℹ 4,023 more rows
To sort in decreasing order, wrap the variable name in
desc()
.
fish %>%
select(lakeid, scalelength) %>%
arrange(desc(scalelength))
## # A tibble: 4,033 × 2
## lakeid scalelength
## <chr> <dbl>
## 1 WS 11.0
## 2 WS 11.0
## 3 WS 11.0
## 4 WS 11.0
## 5 WS 11.0
## 6 WS 11.0
## 7 WS 11.0
## 8 WS 11.0
## 9 WS 11.0
## 10 WS 11.0
## # ℹ 4,023 more rows
arrange()
can be very helpful in combination with the
other functions.
fish %>%
select(lakeid, length, fish_id, scalelength) %>%
filter(lakeid=="AL") %>%
arrange(fish_id)
## # A tibble: 383 × 4
## lakeid length fish_id scalelength
## <chr> <dbl> <dbl> <dbl>
## 1 AL 167 299 2.70
## 2 AL 167 299 2.70
## 3 AL 167 299 2.70
## 4 AL 175 300 3.02
## 5 AL 175 300 3.02
## 6 AL 175 300 3.02
## 7 AL 175 300 3.02
## 8 AL 194 301 3.34
## 9 AL 194 301 3.34
## 10 AL 194 301 3.34
## # ℹ 373 more rows
300
. Extract this information from the
fish
data. Please sort the data by radii length.fish %>%
select(fish_id, radii_length_mm, scalelength) %>%
filter(fish_id==300) %>%
arrange(radii_length_mm)
## # A tibble: 4 × 3
## fish_id radii_length_mm scalelength
## <dbl> <dbl> <dbl>
## 1 300 1.23 3.02
## 2 300 2.14 3.02
## 3 300 2.67 3.02
## 4 300 3.02 3.02
From the mammals
data, filter all members of the family
Felidae
with a gestation period less than 3 months. As part
of this, remove all variables except genus
,
family
, and gestation
. Lastly, remove all NA’s
in the variable gestation
- in this data they are
represented by -999. Sort by gestation.
mammals %>%
select(genus, family, gestation) %>%
filter(family=="Felidae" & gestation<3) %>%
filter(gestation!="-999") %>%
arrange(gestation)
## # A tibble: 20 × 3
## genus family gestation
## <chr> <chr> <dbl>
## 1 Lynx Felidae 1.86
## 2 Felis Felidae 2.05
## 3 Felis Felidae 2.15
## 4 Prionailurus Felidae 2.17
## 5 Lynx Felidae 2.18
## 6 Lynx Felidae 2.18
## 7 Felis Felidae 2.2
## 8 Felis Felidae 2.2
## 9 Prionailurus Felidae 2.23
## 10 Prionailurus Felidae 2.29
## 11 Herpailurus Felidae 2.33
## 12 Lynx Felidae 2.33
## 13 Leptailurus Felidae 2.36
## 14 Otocolobus Felidae 2.37
## 15 Caracal Felidae 2.43
## 16 Oncifelis Felidae 2.45
## 17 Leopardus Felidae 2.51
## 18 Leopardus Felidae 2.64
## 19 Catopuma Felidae 2.67
## 20 Leopardus Felidae 2.67
mutate()
Mutate allows us to create a new column from existing columns in a data frame. We are doing a small introduction here and will add some additional functions later.
Let’s convert the scalelength
variable from millimeters
to centimeters.
fish %>%
mutate(scalelength_cm = scalelength/10) %>%
select(lakeid, scalelength, scalelength_cm)
## # A tibble: 4,033 × 3
## lakeid scalelength scalelength_cm
## <chr> <dbl> <dbl>
## 1 AL 2.70 0.270
## 2 AL 2.70 0.270
## 3 AL 2.70 0.270
## 4 AL 3.02 0.302
## 5 AL 3.02 0.302
## 6 AL 3.02 0.302
## 7 AL 3.02 0.302
## 8 AL 3.34 0.334
## 9 AL 3.34 0.334
## 10 AL 3.34 0.334
## # ℹ 4,023 more rows
mammals %>%
filter(mass!="-999") %>%
mutate(mass_kg = mass/1000) %>%
select(genus, mass, mass_kg) %>%
arrange(desc(mass_kg))
## # A tibble: 1,355 × 3
## genus mass mass_kg
## <chr> <dbl> <dbl>
## 1 Balaenoptera 149000000 149000
## 2 Balaena 80000000 80000
## 3 Balaenoptera 66800000 66800
## 4 Megaptera 30000000 30000
## 5 Eschrichtius 25066667. 25067.
## 6 Eubalaena 23000000 23000
## 7 Eubalaena 23000000 23000
## 8 Balaenoptera 20000000 20000
## 9 Balaenoptera 16266667. 16267.
## 10 Physeter 15400000 15400
## # ℹ 1,345 more rows
mammals %>%
select(genus, mass, wean_mass, gestation) %>%
mutate(mass_kg = mass/1000) %>%
filter(wean_mass!="-999" & gestation!="-999" & mass!="-999") %>%
mutate(wean_gestation_ratio = wean_mass/gestation) %>%
arrange(desc(wean_gestation_ratio))
## # A tibble: 372 × 6
## genus mass wean_mass gestation mass_kg wean_gestation_ratio
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Balaenoptera 149000000 19075000 10.7 149000 1781046.
## 2 Rhinoceros 1602333. 885812. 16.4 1602. 53882.
## 3 Hippopotamus 1258333. 237500 7.75 1258. 30645.
## 4 Loxodonta 3507000 600000 21.5 3507 27959.
## 5 Camelus 434000 253484 12.6 434 20054.
## 6 Bison 497667. 157500 8.93 498. 17637.
## 7 Odobenus 650000 200000 11.5 650 17391.
## 8 Mirounga 579400 130192. 7.63 579. 17063.
## 9 Mirounga 716667. 138037. 8.45 717. 16336.
## 10 Syncerus 504667. 166000 11.0 505. 15050.
## # ℹ 362 more rows
Please review the learning goals and be sure to use the code here as
a reference when completing the homework.
–>Home