Learning Goals

At the end of this exercise, you will be able to:
1. Import .csv files as data frames using read_csv().
2. Use summary functions to explore the dimensions, structure, and contents of a data frame.
3. Use the select() command of dplyr to sort data frames.

Load the tidyverse

library("tidyverse")

Practice

In your lab 3 folder there is another folder titled data. Inside the data folder there is a .csv titled Gaeta_etal_CLC_data.csv. Open this data and store them as an object called fish.

The data are from: Gaeta J., G. Sass, S. Carpenter. 2012. Biocomplexity at North Temperate Lakes LTER: Coordinated Field Studies: Large Mouth Bass Growth 2006. Environmental Data Initiative. link

fish <- read_csv("data/Gaeta_etal_CLC_data.csv")
## Rows: 4033 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): lakeid, annnumber
## dbl (4): fish_id, length, radii_length_mm, scalelength
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Notice that when the data are imported, you are presented with a message that tells you how R interpreted the column classes. This is also where error messages will appear if there are problems.

Summary functions

Once data have been uploaded, you may want to get an idea of its structure, contents, and dimensions. I routinely run one or more of these commands when data are first imported.

We can summarize our data frame with thesummary() function.

#summary(fish)

glimpse() is another useful summary function.

#glimpse(fish)

nrow() gives the numbers of rows.

#nrow(fish)

ncol gives the number of columns.

#ncol(fish)

dim() gives the dimensions.

#dim(fish)

names gives the column names.

#names(fish)

head() prints the first n rows of the data frame.

#head(fish)

tail() prinst the last n rows of the data frame.

#tail(fish)

table() is useful when you have a limited number of categorical variables. It produces fast counts of the number of observations in a variable. We will come back to this later…

#table(fish$lakeid)

Select

Select is a way of sorting data frames by pulling out variables (columns) of interest.

#select(fish, lakeid, length)

Filter

Filter is a way of pulling out observations that meet specific criteria in a variable. We will work a lot more with this in the next lab.

#filter(fish, length<=100)

Practice

  1. Load the data mammal_lifehistories_v2.csv and place it into a new object called mammals.
mammals <- read_csv("data/mammal_lifehistories_v2.csv")
## Rows: 1440 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): order, family, Genus, species
## dbl (9): mass, gestation, newborn, weaning, wean mass, AFR, max. life, litte...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Provide the dimensions of the data frame.
dim(mammals)
## [1] 1440   13
  1. Check the column names in the data frame.
names(mammals)
##  [1] "order"        "family"       "Genus"        "species"      "mass"        
##  [6] "gestation"    "newborn"      "weaning"      "wean mass"    "AFR"         
## [11] "max. life"    "litter size"  "litters/year"
  1. Use str() to show the structure of the data frame and its individual columns; compare this to glimpse().
str(mammals)
## spc_tbl_ [1,440 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ order       : chr [1:1440] "Artiodactyla" "Artiodactyla" "Artiodactyla" "Artiodactyla" ...
##  $ family      : chr [1:1440] "Antilocapridae" "Bovidae" "Bovidae" "Bovidae" ...
##  $ Genus       : chr [1:1440] "Antilocapra" "Addax" "Aepyceros" "Alcelaphus" ...
##  $ species     : chr [1:1440] "americana" "nasomaculatus" "melampus" "buselaphus" ...
##  $ mass        : num [1:1440] 45375 182375 41480 150000 28500 ...
##  $ gestation   : num [1:1440] 8.13 9.39 6.35 7.9 6.8 5.08 5.72 5.5 8.93 9.14 ...
##  $ newborn     : num [1:1440] 3246 5480 5093 10167 -999 ...
##  $ weaning     : num [1:1440] 3 6.5 5.63 6.5 -999 ...
##  $ wean mass   : num [1:1440] 8900 -999 15900 -999 -999 ...
##  $ AFR         : num [1:1440] 13.5 27.3 16.7 23 -999 ...
##  $ max. life   : num [1:1440] 142 308 213 240 -999 251 228 255 300 324 ...
##  $ litter size : num [1:1440] 1.85 1 1 1 1 1.37 1 1 1 1 ...
##  $ litters/year: num [1:1440] 1 0.99 0.95 -999 -999 2 -999 1.89 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   order = col_character(),
##   ..   family = col_character(),
##   ..   Genus = col_character(),
##   ..   species = col_character(),
##   ..   mass = col_double(),
##   ..   gestation = col_double(),
##   ..   newborn = col_double(),
##   ..   weaning = col_double(),
##   ..   `wean mass` = col_double(),
##   ..   AFR = col_double(),
##   ..   `max. life` = col_double(),
##   ..   `litter size` = col_double(),
##   ..   `litters/year` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
glimpse(mammals)
## Rows: 1,440
## Columns: 13
## $ order          <chr> "Artiodactyla", "Artiodactyla", "Artiodactyla", "Artiod…
## $ family         <chr> "Antilocapridae", "Bovidae", "Bovidae", "Bovidae", "Bov…
## $ Genus          <chr> "Antilocapra", "Addax", "Aepyceros", "Alcelaphus", "Amm…
## $ species        <chr> "americana", "nasomaculatus", "melampus", "buselaphus",…
## $ mass           <dbl> 45375.0, 182375.0, 41480.0, 150000.0, 28500.0, 55500.0,…
## $ gestation      <dbl> 8.13, 9.39, 6.35, 7.90, 6.80, 5.08, 5.72, 5.50, 8.93, 9…
## $ newborn        <dbl> 3246.36, 5480.00, 5093.00, 10166.67, -999.00, 3810.00, …
## $ weaning        <dbl> 3.00, 6.50, 5.63, 6.50, -999.00, 4.00, 4.04, 2.13, 10.7…
## $ `wean mass`    <dbl> 8900, -999, 15900, -999, -999, -999, -999, -999, 157500…
## $ AFR            <dbl> 13.53, 27.27, 16.66, 23.02, -999.00, 14.89, 10.23, 20.1…
## $ `max. life`    <dbl> 142, 308, 213, 240, -999, 251, 228, 255, 300, 324, 300,…
## $ `litter size`  <dbl> 1.85, 1.00, 1.00, 1.00, 1.00, 1.37, 1.00, 1.00, 1.00, 1…
## $ `litters/year` <dbl> 1.00, 0.99, 0.95, -999.00, -999.00, 2.00, -999.00, 1.89…
  1. Try the table() command to produce counts of mammal order, family, and genus.
table(mammals$order)
## 
##   Artiodactyla      Carnivora        Cetacea     Dermoptera     Hyracoidea 
##            161            197             55              2              4 
##    Insectivora     Lagomorpha  Macroscelidea Perissodactyla      Pholidota 
##             91             42             10             15              7 
##       Primates    Proboscidea       Rodentia     Scandentia        Sirenia 
##            156              2            665              7              5 
##  Tubulidentata      Xenarthra 
##              1             20
  1. Use the filter() command to pull out mammals that have a gestation period greater than or equal to one year.
filter(mammals, gestation >= 365)
## # A tibble: 0 × 13
## # ℹ 13 variables: order <chr>, family <chr>, Genus <chr>, species <chr>,
## #   mass <dbl>, gestation <dbl>, newborn <dbl>, weaning <dbl>, wean mass <dbl>,
## #   AFR <dbl>, max. life <dbl>, litter size <dbl>, litters/year <dbl>

Wrap-up

Please review the learning goals and be sure to use the code here as a reference when completing the homework.
–>Home