At the end of this exercise, you will be able to:
1. Import .csv files as data frames using read_csv()
.
2. Use summary functions to explore the dimensions, structure, and
contents of a data frame.
3. Use the select()
command of dplyr to sort data
frames.
library("tidyverse")
In your lab 3 folder there is another folder titled
data
. Inside the data
folder there is a
.csv
titled Gaeta_etal_CLC_data.csv
. Open this
data and store them as an object called fish
.
The data are from: Gaeta J., G. Sass, S. Carpenter. 2012. Biocomplexity at North Temperate Lakes LTER: Coordinated Field Studies: Large Mouth Bass Growth 2006. Environmental Data Initiative. link
fish <- read_csv("data/Gaeta_etal_CLC_data.csv")
## Rows: 4033 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): lakeid, annnumber
## dbl (4): fish_id, length, radii_length_mm, scalelength
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Notice that when the data are imported, you are presented with a message that tells you how R interpreted the column classes. This is also where error messages will appear if there are problems.
Once data have been uploaded, you may want to get an idea of its structure, contents, and dimensions. I routinely run one or more of these commands when data are first imported.
We can summarize our data frame with thesummary()
function.
#summary(fish)
glimpse()
is another useful summary function.
#glimpse(fish)
nrow()
gives the numbers of rows.
#nrow(fish)
ncol
gives the number of columns.
#ncol(fish)
dim()
gives the dimensions.
#dim(fish)
names
gives the column names.
#names(fish)
head()
prints the first n rows of the data frame.
#head(fish)
tail()
prinst the last n rows of the data frame.
#tail(fish)
table()
is useful when you have a limited number of
categorical variables. It produces fast counts of the number of
observations in a variable. We will come back to this later…
#table(fish$lakeid)
Select is a way of sorting data frames by pulling out variables (columns) of interest.
#select(fish, lakeid, length)
Filter is a way of pulling out observations that meet specific criteria in a variable. We will work a lot more with this in the next lab.
#filter(fish, length<=100)
mammal_lifehistories_v2.csv
and place it
into a new object called mammals
.mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dim(mammals)
## [1] 1440 13
names(mammals)
## [1] "order" "family" "Genus" "species" "mass"
## [6] "gestation" "newborn" "weaning" "wean mass" "AFR"
## [11] "max. life" "litter size" "litters/year"
str()
to show the structure of the data frame and
its individual columns; compare this to glimpse()
.str(mammals)
## spc_tbl_ [1,440 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ order : chr [1:1440] "Artiodactyla" "Artiodactyla" "Artiodactyla" "Artiodactyla" ...
## $ family : chr [1:1440] "Antilocapridae" "Bovidae" "Bovidae" "Bovidae" ...
## $ Genus : chr [1:1440] "Antilocapra" "Addax" "Aepyceros" "Alcelaphus" ...
## $ species : chr [1:1440] "americana" "nasomaculatus" "melampus" "buselaphus" ...
## $ mass : num [1:1440] 45375 182375 41480 150000 28500 ...
## $ gestation : num [1:1440] 8.13 9.39 6.35 7.9 6.8 5.08 5.72 5.5 8.93 9.14 ...
## $ newborn : num [1:1440] 3246 5480 5093 10167 -999 ...
## $ weaning : num [1:1440] 3 6.5 5.63 6.5 -999 ...
## $ wean mass : num [1:1440] 8900 -999 15900 -999 -999 ...
## $ AFR : num [1:1440] 13.5 27.3 16.7 23 -999 ...
## $ max. life : num [1:1440] 142 308 213 240 -999 251 228 255 300 324 ...
## $ litter size : num [1:1440] 1.85 1 1 1 1 1.37 1 1 1 1 ...
## $ litters/year: num [1:1440] 1 0.99 0.95 -999 -999 2 -999 1.89 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. order = col_character(),
## .. family = col_character(),
## .. Genus = col_character(),
## .. species = col_character(),
## .. mass = col_double(),
## .. gestation = col_double(),
## .. newborn = col_double(),
## .. weaning = col_double(),
## .. `wean mass` = col_double(),
## .. AFR = col_double(),
## .. `max. life` = col_double(),
## .. `litter size` = col_double(),
## .. `litters/year` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
glimpse(mammals)
## Rows: 1,440
## Columns: 13
## $ order <chr> "Artiodactyla", "Artiodactyla", "Artiodactyla", "Artiod…
## $ family <chr> "Antilocapridae", "Bovidae", "Bovidae", "Bovidae", "Bov…
## $ Genus <chr> "Antilocapra", "Addax", "Aepyceros", "Alcelaphus", "Amm…
## $ species <chr> "americana", "nasomaculatus", "melampus", "buselaphus",…
## $ mass <dbl> 45375.0, 182375.0, 41480.0, 150000.0, 28500.0, 55500.0,…
## $ gestation <dbl> 8.13, 9.39, 6.35, 7.90, 6.80, 5.08, 5.72, 5.50, 8.93, 9…
## $ newborn <dbl> 3246.36, 5480.00, 5093.00, 10166.67, -999.00, 3810.00, …
## $ weaning <dbl> 3.00, 6.50, 5.63, 6.50, -999.00, 4.00, 4.04, 2.13, 10.7…
## $ `wean mass` <dbl> 8900, -999, 15900, -999, -999, -999, -999, -999, 157500…
## $ AFR <dbl> 13.53, 27.27, 16.66, 23.02, -999.00, 14.89, 10.23, 20.1…
## $ `max. life` <dbl> 142, 308, 213, 240, -999, 251, 228, 255, 300, 324, 300,…
## $ `litter size` <dbl> 1.85, 1.00, 1.00, 1.00, 1.00, 1.37, 1.00, 1.00, 1.00, 1…
## $ `litters/year` <dbl> 1.00, 0.99, 0.95, -999.00, -999.00, 2.00, -999.00, 1.89…
table()
command to produce counts of mammal
order, family, and genus.table(mammals$order)
##
## Artiodactyla Carnivora Cetacea Dermoptera Hyracoidea
## 161 197 55 2 4
## Insectivora Lagomorpha Macroscelidea Perissodactyla Pholidota
## 91 42 10 15 7
## Primates Proboscidea Rodentia Scandentia Sirenia
## 156 2 665 7 5
## Tubulidentata Xenarthra
## 1 20
filter()
command to pull out mammals that have
a gestation period greater than or equal to one year.filter(mammals, gestation >= 365)
## # A tibble: 0 × 13
## # ℹ 13 variables: order <chr>, family <chr>, Genus <chr>, species <chr>,
## # mass <dbl>, gestation <dbl>, newborn <dbl>, weaning <dbl>, wean mass <dbl>,
## # AFR <dbl>, max. life <dbl>, litter size <dbl>, litters/year <dbl>
Please review the learning goals and be sure to use the code here as
a reference when completing the homework.
–>Home