Learning Goals

At the end of this exercise, you will be able to:
1. Define data structure.
2. Build a new vector and call elements within it.
3. Combine a series of vectors into a data frame.
4. Name columns and rows in a data frame.
5. Select columns and rows and use summary functions.
6. Write your data frame to a csv file!

Load the tidyverse

library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data Structures

In addition to classes of data, R also organizes data in different ways. These are called data structures and include vectors, lists, matrices, data frames, and factors. Here, we will introduce vectors and data frames.

Vectors

Vectors are a common way of organizing data in R. We create vectors using the c command. The c stands for concatenate. We used this in the first part of today’s lab.

A numeric vector.

my_vector <- c(10, 20, 30)

A character vector.

days_of_the_week <- c("Monday", "Tuesday", "Wednesday", "Thrusday", "Friday", "Saturday", "Sunday")

A convenient trick for creating a vector to play around with is to generate a sequence of numbers.

my_vector_sequence <- c(1:100)

Identifying vector elements

We can use [] to pull out elements in a vector. We just need to specify their position in the vector; i.e. day 3 is Wednesday.

days_of_the_week[4]
## [1] "Thrusday"
my_vector_sequence[10]
## [1] 10

Practice

  1. Use [] to determine which element in my_vector_sequence has a value of 15.
my_vector_sequence[15]
## [1] 15
  1. We can use operators such as <, >, ==, <==, etc. Show all values in my_vector_sequence that are less than or equal to 10.
my_vector_sequence <= 10
##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE
  1. If you use [] then you only get the values, not the logical evaluation of the entire vector. Experiment with this by adjusting the chunk below.
my_vector_sequence[my_vector_sequence <= 10]
##  [1]  1  2  3  4  5  6  7  8  9 10

Data Frames

The data frame is the most common way to organize data within R. A data frame can store data of many different classes. We usually don’t build data frames in RStudio, but this example will show you how they are structured.

Let’s build separate vectors that include length (in), weight (oz), and sex of three ruby-throated hummingbirds.

Sex <- c("male", "female", "male")
Length <- c(3.2, 3.7, 3.4)
Weight <- c(2.9, 4.0, 3.1)

Here we combine our three vectors to create a data frame with the function data.frame().

hbirds <- data.frame(Sex, Length, Weight)

Since we work in the tidyverse, we can also use the tibble() function to create a data frame. A tibble is a modern take on data frames. Tibbles are data frames, but they tweak some older behaviors to make life a little easier.

hbirds <- tibble(Sex, Length, Weight)

Notice that not only are the data neat and clean looking, there is also information provided about the class of data. dbl means that the value is a type of numeric double precision floating point.

hbirds
## # A tibble: 3 × 3
##   Sex    Length Weight
##   <chr>   <dbl>  <dbl>
## 1 male      3.2    2.9
## 2 female    3.7    4  
## 3 male      3.4    3.1

What are the column names of our data frame? Notice that R defaulted to using the names of our vectors, but we could name them something else when creating the data frame, or rename them later.

names(hbirds)
## [1] "Sex"    "Length" "Weight"

What are the dimensions of the hbirds data frame? The dim() and str() commands provide this information.

dim(hbirds)
## [1] 3 3
str(hbirds)
## tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Sex   : chr [1:3] "male" "female" "male"
##  $ Length: num [1:3] 3.2 3.7 3.4
##  $ Weight: num [1:3] 2.9 4 3.1

Let’s use lowercase names when we create the data frame. We just changed to lowercase here, but we could use any names we wish.

hbirds <- tibble(sex=Sex, length=Length, weight_g=Weight)
hbirds
## # A tibble: 3 × 3
##   sex    length weight_g
##   <chr>   <dbl>    <dbl>
## 1 male      3.2      2.9
## 2 female    3.7      4  
## 3 male      3.4      3.1

Accessing Data Frame Columns and Rows

The same methods of selecting elements in vectors and data matrices apply to data frames. We use []. We have two positions where the first applies to the rows, and the second to the columns.

The first row.

hbirds[1,]
## # A tibble: 1 × 3
##   sex   length weight_g
##   <chr>  <dbl>    <dbl>
## 1 male     3.2      2.9

The third column.

hbirds[ ,3]
## # A tibble: 3 × 1
##   weight_g
##      <dbl>
## 1      2.9
## 2      4  
## 3      3.1

Calculations

We can use the $ to access a column (variable) in a data frame. Here we calculate the mean length of the hummingbirds.

mean(hbirds$length)
## [1] 3.433333

Writing Data to File

We should save our hbirds data frame so we can use it again later! There are many ways to save data in R, here we write our data frame to a csv file. We use row.names = FALSE to avoid row numbers from printing out.

write.csv(hbirds, "hbirds_data.csv", row.names = FALSE)

Wrap-up

Please review the learning goals and be sure to use the code here as a reference when completing the homework.
–>Home