At the end of this exercise, you will be able to:
1. Use distinct() to find unique observations in
rows.
2. Use mutate() to create new columns from existing
columns.
3. Use mutate() with across and
where to transform multiple columns that meet specific
criteria. 4. Use if_else() to conditionally change values
in a column. 5. Clean data using janitor and
mutate().
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("janitor")
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library("palmerpenguins") #load the palmerpenguins package
##
## Attaching package: 'palmerpenguins'
##
## The following objects are masked from 'package:datasets':
##
## penguins, penguins_raw
options(scipen=999) #turn off scientific notation
These data are from: Gorman KB, Williams TD, Fraser WR (2014). Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis). PLoS ONE 9(3):e90081. https://doi.org/10.1371/journal.pone.0090081
Recall that the the verbs select() and
filter() are used to extract columns and rows from a
dataframe. We use the pipe operator %>% to connect
multiple functions together.
penguins %>%
select(species, island, body_mass_g) %>%
arrange(body_mass_g)
## # A tibble: 344 × 3
## species island body_mass_g
## <fct> <fct> <int>
## 1 Chinstrap Dream 2700
## 2 Adelie Biscoe 2850
## 3 Adelie Biscoe 2850
## 4 Adelie Biscoe 2900
## 5 Adelie Dream 2900
## 6 Adelie Torgersen 2900
## 7 Chinstrap Dream 2900
## 8 Adelie Biscoe 2925
## 9 Adelie Dream 2975
## 10 Adelie Dream 3000
## # ℹ 334 more rows
penguins %>%
filter(island=="Biscoe" | island=="Dream")
## # A tibble: 292 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # ℹ 282 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
penguins %>%
filter(island==c("Biscoe", "Dream"))
## # A tibble: 146 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 35.9 19.2 189 3800
## 3 Adelie Biscoe 38.8 17.2 180 3800
## 4 Adelie Biscoe 40.6 18.6 183 3550
## 5 Adelie Biscoe 37.9 18.6 172 3150
## 6 Adelie Dream 37.2 18.1 178 3900
## 7 Adelie Dream 40.9 18.9 184 3900
## 8 Adelie Dream 39.2 21.1 196 4150
## 9 Adelie Dream 42.2 18.5 180 3550
## 10 Adelie Dream 39.8 19.1 184 4650
## # ℹ 136 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
penguins %>%
filter(island %in% c("Biscoe", "Dream"))
## # A tibble: 292 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # ℹ 282 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
penguins %>%
ggplot(aes(x=flipper_length_mm, y=body_mass_g, color=species)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
distinct()distinct() looks for all unique observations in rows.
This is a little tricky because it can look like it is working
column-wise, but it is actually working row-wise.
One helpful approach to new data is to find any duplicated rows. If we first look at the dimensions of the penguins data, we see it has 344 rows and 8 columns.
dim(penguins)
## [1] 344 8
Using distinct() across all rows, we see there are no
duplicates. This means every row contains unique observations across all
variables.
penguins %>%
distinct()
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
But if we only look at species, we can see that there are only 3 unique species in the data.
penguins %>%
distinct(species)
## # A tibble: 3 × 1
## species
## <fct>
## 1 Adelie
## 2 Gentoo
## 3 Chinstrap
What if we want to know which islands each species occurs on?
penguins %>%
distinct(species, island, .keep_all=TRUE)
## # A tibble: 5 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Biscoe 37.8 18.3 174 3400
## 3 Adelie Dream 39.5 16.7 178 3250
## 4 Gentoo Biscoe 46.1 13.2 211 4500
## 5 Chinstrap Dream 46.5 17.9 192 3500
## # ℹ 2 more variables: sex <fct>, year <int>
mutate()mutate() is another verb that acts on columns. It allows
us to create new columns from existing columns in a data frame. When we
use mutate(), the columns are added to the end of the
dataframe by default. Let’s create a new column that converts body mass
from grams to kilograms.
penguins %>%
mutate(body_mass_kg = body_mass_g/1000) %>%
select(species, body_mass_g, body_mass_kg) %>%
arrange(body_mass_kg)
## # A tibble: 344 × 3
## species body_mass_g body_mass_kg
## <fct> <int> <dbl>
## 1 Chinstrap 2700 2.7
## 2 Adelie 2850 2.85
## 3 Adelie 2850 2.85
## 4 Adelie 2900 2.9
## 5 Adelie 2900 2.9
## 6 Adelie 2900 2.9
## 7 Chinstrap 2900 2.9
## 8 Adelie 2925 2.92
## 9 Adelie 2975 2.98
## 10 Adelie 3000 3
## # ℹ 334 more rows
mutate() and across()We use across() within mutate() to apply a
function to multiple columns. This is especially helpful when cleaning
data. For example, let’s say we want to convert all columns that end
with mm to centimeters. We can use across() to
do this.
penguins %>%
mutate(across(ends_with("mm"), ~./10)) %>%
select(species,
bill_length_cm=bill_length_mm,
bill_depth_cm=bill_depth_mm,
flipper_length_cm=flipper_length_mm)
## # A tibble: 344 × 4
## species bill_length_cm bill_depth_cm flipper_length_cm
## <fct> <dbl> <dbl> <dbl>
## 1 Adelie 3.91 1.87 18.1
## 2 Adelie 3.95 1.74 18.6
## 3 Adelie 4.03 1.8 19.5
## 4 Adelie NA NA NA
## 5 Adelie 3.67 1.93 19.3
## 6 Adelie 3.93 2.06 19
## 7 Adelie 3.89 1.78 18.1
## 8 Adelie 3.92 1.96 19.5
## 9 Adelie 3.41 1.81 19.3
## 10 Adelie 4.2 2.02 19
## # ℹ 334 more rows
What does the ~./10 mean? The ~ indicates
that what follows is a formula (lambda function). The .
represents the current column being processed. So, ./10
means “take the current column and divide it by 10”. This operation is
applied to all columns that end with mm.
Cleaning raw data is an essential, but tedious step in data analysis. It’s impossible to predict every scenario that you will come across, but there are some common issues that we can learn to address.
We already learned how to use rename() to change column
names. We also learned how to rename columns from within
select(). But, this can be very inefficient if we have a
large dataset.
Let’s have a look at some new data focused on mammal lifehistories. The data are from: S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402. link
mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
What is the structure of the data? Are there any NA’s or other issues?
glimpse(mammals)
## Rows: 1,440
## Columns: 13
## $ order <chr> "Artiodactyla", "Artiodactyla", "Artiodactyla", "Artiod…
## $ family <chr> "Antilocapridae", "Bovidae", "Bovidae", "Bovidae", "Bov…
## $ Genus <chr> "Antilocapra", "Addax", "Aepyceros", "Alcelaphus", "Amm…
## $ species <chr> "americana", "nasomaculatus", "melampus", "buselaphus",…
## $ mass <dbl> 45375.0, 182375.0, 41480.0, 150000.0, 28500.0, 55500.0,…
## $ gestation <dbl> 8.13, 9.39, 6.35, 7.90, 6.80, 5.08, 5.72, 5.50, 8.93, 9…
## $ newborn <dbl> 3246.36, 5480.00, 5093.00, 10166.67, -999.00, 3810.00, …
## $ weaning <dbl> 3.00, 6.50, 5.63, 6.50, -999.00, 4.00, 4.04, 2.13, 10.7…
## $ `wean mass` <dbl> 8900, -999, 15900, -999, -999, -999, -999, -999, 157500…
## $ AFR <dbl> 13.53, 27.27, 16.66, 23.02, -999.00, 14.89, 10.23, 20.1…
## $ `max. life` <dbl> 142, 308, 213, 240, -999, 251, 228, 255, 300, 324, 300,…
## $ `litter size` <dbl> 1.85, 1.00, 1.00, 1.00, 1.00, 1.37, 1.00, 1.00, 1.00, 1…
## $ `litters/year` <dbl> 1.00, 0.99, 0.95, -999.00, -999.00, 2.00, -999.00, 1.89…
One thing to notice is the column names are inconsistent. This is
going to cause problems for us down the line. We could rename each
column, one at a time, using rename(), but that would be
tedious. Instead, we can use the clean_names() function
from the janitor package to fix all of the column names at
once.
mammals <- mammals %>%
clean_names()
glimpse(mammals)
## Rows: 1,440
## Columns: 13
## $ order <chr> "Artiodactyla", "Artiodactyla", "Artiodactyla", "Artiodac…
## $ family <chr> "Antilocapridae", "Bovidae", "Bovidae", "Bovidae", "Bovid…
## $ genus <chr> "Antilocapra", "Addax", "Aepyceros", "Alcelaphus", "Ammod…
## $ species <chr> "americana", "nasomaculatus", "melampus", "buselaphus", "…
## $ mass <dbl> 45375.0, 182375.0, 41480.0, 150000.0, 28500.0, 55500.0, 3…
## $ gestation <dbl> 8.13, 9.39, 6.35, 7.90, 6.80, 5.08, 5.72, 5.50, 8.93, 9.1…
## $ newborn <dbl> 3246.36, 5480.00, 5093.00, 10166.67, -999.00, 3810.00, 39…
## $ weaning <dbl> 3.00, 6.50, 5.63, 6.50, -999.00, 4.00, 4.04, 2.13, 10.71,…
## $ wean_mass <dbl> 8900, -999, 15900, -999, -999, -999, -999, -999, 157500, …
## $ afr <dbl> 13.53, 27.27, 16.66, 23.02, -999.00, 14.89, 10.23, 20.13,…
## $ max_life <dbl> 142, 308, 213, 240, -999, 251, 228, 255, 300, 324, 300, 3…
## $ litter_size <dbl> 1.85, 1.00, 1.00, 1.00, 1.00, 1.37, 1.00, 1.00, 1.00, 1.0…
## $ litters_year <dbl> 1.00, 0.99, 0.95, -999.00, -999.00, 2.00, -999.00, 1.89, …
Notice that clean_names() has converted all column names
to lowercase and replaced spaces with underscores. But, no adjustments
were made to the data itself. What if we want to change observations
from upper case to lower case?
mammals %>%
mutate(across(c("order", "family"), tolower)) #specific columns
## # A tibble: 1,440 × 13
## order family genus species mass gestation newborn weaning wean_mass afr
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 artio… antil… Anti… americ… 4.54e4 8.13 3246. 3 8900 13.5
## 2 artio… bovid… Addax nasoma… 1.82e5 9.39 5480 6.5 -999 27.3
## 3 artio… bovid… Aepy… melamp… 4.15e4 6.35 5093 5.63 15900 16.7
## 4 artio… bovid… Alce… busela… 1.5 e5 7.9 10167. 6.5 -999 23.0
## 5 artio… bovid… Ammo… clarkei 2.85e4 6.8 -999 -999 -999 -999
## 6 artio… bovid… Ammo… lervia 5.55e4 5.08 3810 4 -999 14.9
## 7 artio… bovid… Anti… marsup… 3 e4 5.72 3910 4.04 -999 10.2
## 8 artio… bovid… Anti… cervic… 3.75e4 5.5 3846 2.13 -999 20.1
## 9 artio… bovid… Bison bison 4.98e5 8.93 20000 10.7 157500 29.4
## 10 artio… bovid… Bison bonasus 5 e5 9.14 23000. 6.6 -999 30.0
## # ℹ 1,430 more rows
## # ℹ 3 more variables: max_life <dbl>, litter_size <dbl>, litters_year <dbl>
This will change all columns to lower case. But, notice what happens to numeric columns.
mammals %>%
mutate(across(everything(), tolower)) #all columns
## # A tibble: 1,440 × 13
## order family genus species mass gestation newborn weaning wean_mass afr
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 artioda… antil… anti… americ… 45375 8.13 3246.36 3 8900 13.53
## 2 artioda… bovid… addax nasoma… 1823… 9.39 5480 6.5 -999 27.27
## 3 artioda… bovid… aepy… melamp… 41480 6.35 5093 5.63 15900 16.66
## 4 artioda… bovid… alce… busela… 1500… 7.9 10166.… 6.5 -999 23.02
## 5 artioda… bovid… ammo… clarkei 28500 6.8 -999 -999 -999 -999
## 6 artioda… bovid… ammo… lervia 55500 5.08 3810 4 -999 14.89
## 7 artioda… bovid… anti… marsup… 30000 5.72 3910 4.04 -999 10.23
## 8 artioda… bovid… anti… cervic… 37500 5.5 3846 2.13 -999 20.13
## 9 artioda… bovid… bison bison 4976… 8.93 20000 10.71 157500 29.45
## 10 artioda… bovid… bison bonasus 5000… 9.14 23000.… 6.6 -999 29.99
## # ℹ 1,430 more rows
## # ℹ 3 more variables: max_life <chr>, litter_size <chr>, litters_year <chr>
For this reason, it might be better to use where so we
can specify only character columns.
mammals <- mammals %>%
mutate(across(where(is.character), tolower)) #all character columns
if_else()We briefly introduce if_else() here because it allows us
to use mutate() but not have the entire column affected in
the same way. With ifelse(), you first specify a logical
statement, afterwards what needs to happen if the statement returns
TRUE, and lastly what needs to happen if it’s
FALSE.
Have a look at the data from mammals below. Notice that the values
for newborn include -999.00. This is sometimes used as a
placeholder for NA (but, is a really bad idea). We can use
if_else() to replace -999.00 with
NA.
mammals %>%
select(genus, species, newborn) %>%
arrange(newborn)
## # A tibble: 1,440 × 3
## genus species newborn
## <chr> <chr> <dbl>
## 1 ammodorcas clarkei -999
## 2 bos javanicus -999
## 3 bubalus depressicornis -999
## 4 bubalus mindorensis -999
## 5 capra falconeri -999
## 6 cephalophus niger -999
## 7 cephalophus nigrifrons -999
## 8 cephalophus natalensis -999
## 9 cephalophus leucogaster -999
## 10 cephalophus ogilbyi -999
## # ℹ 1,430 more rows
mammals %>%
select(genus, species, newborn) %>%
mutate(newborn_new = ifelse(newborn == -999.00, NA, newborn))%>%
arrange(newborn)
## # A tibble: 1,440 × 4
## genus species newborn newborn_new
## <chr> <chr> <dbl> <dbl>
## 1 ammodorcas clarkei -999 NA
## 2 bos javanicus -999 NA
## 3 bubalus depressicornis -999 NA
## 4 bubalus mindorensis -999 NA
## 5 capra falconeri -999 NA
## 6 cephalophus niger -999 NA
## 7 cephalophus nigrifrons -999 NA
## 8 cephalophus natalensis -999 NA
## 9 cephalophus leucogaster -999 NA
## 10 cephalophus ogilbyi -999 NA
## # ℹ 1,430 more rows
mammals <- mammals %>%
mutate(across(c(mass, wean_mass, gestation, max_life, newborn, weaning, litter_size, afr, litters_year),
~ifelse(. == -999, NA, .)))
summary(mammals)
## order family genus species
## Length:1440 Length:1440 Length:1440 Length:1440
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## mass gestation newborn weaning
## Min. : 2 Min. : 0.4900 Min. : 0.21 Min. : 0.300
## 1st Qu.: 61 1st Qu.: 0.9925 1st Qu.: 4.40 1st Qu.: 0.920
## Median : 606 Median : 2.1100 Median : 43.70 Median : 1.690
## Mean : 407701 Mean : 3.8630 Mean : 12126.55 Mean : 3.967
## 3rd Qu.: 8554 3rd Qu.: 6.0000 3rd Qu.: 542.50 3rd Qu.: 4.840
## Max. :149000000 Max. :21.4600 Max. :2250000.00 Max. :48.000
## NA's :85 NA's :418 NA's :595 NA's :619
## wean_mass afr max_life litter_size
## Min. : 2.1 Min. : 0.70 Min. : 12 Min. : 1.000
## 1st Qu.: 20.1 1st Qu.: 4.50 1st Qu.: 84 1st Qu.: 1.018
## Median : 102.6 Median : 12.00 Median : 192 Median : 2.500
## Mean : 60220.5 Mean : 22.44 Mean : 224 Mean : 2.805
## 3rd Qu.: 2000.0 3rd Qu.: 28.24 3rd Qu.: 288 3rd Qu.: 4.000
## Max. :19075000.0 Max. :210.00 Max. :1368 Max. :14.180
## NA's :1039 NA's :607 NA's :841 NA's :84
## litters_year
## Min. :0.140
## 1st Qu.:1.000
## Median :1.000
## Mean :1.636
## 3rd Qu.:2.000
## Max. :7.500
## NA's :689
mass_kg that
that converts mass from grams to kilograms. Select the columns genus,
species, mass, and mass_kg, and arrange the data by mass_kg in
descending order. What is the common name for the species with the
highest mass?mammals %>%
mutate(mass_kg = mass/1000) %>%
select(genus, species, mass, mass_kg) %>%
arrange(desc(mass_kg))
## # A tibble: 1,440 × 4
## genus species mass mass_kg
## <chr> <chr> <dbl> <dbl>
## 1 balaenoptera musculus 149000000 149000
## 2 balaena mysticetus 80000000 80000
## 3 balaenoptera physalus 66800000 66800
## 4 megaptera novaeangliae 30000000 30000
## 5 eschrichtius robustus 25066667. 25067.
## 6 eubalaena australis 23000000 23000
## 7 eubalaena glacialis 23000000 23000
## 8 balaenoptera edeni 20000000 20000
## 9 balaenoptera acutorostrata 16266667. 16267.
## 10 physeter catodon 15400000 15400
## # ℹ 1,430 more rows
mammals %>%
mutate(mass_kg = mass/1000) %>%
mutate(wean_gestation_ratio = log10(newborn/gestation)) %>%
select(genus, species, wean_gestation_ratio) %>%
arrange(desc(wean_gestation_ratio))
## # A tibble: 1,440 × 3
## genus species wean_gestation_ratio
## <chr> <chr> <dbl>
## 1 balaenoptera musculus 5.32
## 2 balaenoptera physalus 5.23
## 3 megaptera novaeangliae 5.06
## 4 physeter catodon 4.78
## 5 balaenoptera borealis 4.76
## 6 eschrichtius robustus 4.63
## 7 orcinus orca 4.03
## 8 kogia breviceps 3.87
## 9 globicephala melas 3.87
## 10 monodon monoceros 3.73
## # ℹ 1,430 more rows
mammals %>%
ggplot(aes(x=gestation, y=log10(newborn))) +
geom_point()+
geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 673 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 673 rows containing missing values or values outside the scale range
## (`geom_point()`).
mammals %>%
select(family, genus, species, max_life) %>%
mutate(max_life_new = max_life/12) %>%
arrange(desc(max_life_new))
## # A tibble: 1,440 × 5
## family genus species max_life max_life_new
## <chr> <chr> <chr> <dbl> <dbl>
## 1 balaenopteridae balaenoptera physalus 1368 114
## 2 balaenopteridae balaenoptera musculus 1320 110
## 3 balaenidae balaena mysticetus 1200 100
## 4 delphinidae orcinus orca 1080 90
## 5 ziphiidae berardius bairdii 1008 84
## 6 elephantidae elephas maximus 960 80
## 7 balaenopteridae megaptera novaeangliae 924 77
## 8 physeteridae physeter catodon 924 77
## 9 balaenopteridae balaenoptera borealis 888 74
## 10 dugongidae dugong dugon 876 73
## # ℹ 1,430 more rows
mammals %>%
ggplot(aes(x=log10(mass), y=max_life)) +
geom_point()+
geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 848 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 848 rows containing missing values or values outside the scale range
## (`geom_point()`).
–>Home