Learning Goals

At the end of this exercise, you will be able to:
1. Define an object in R.
2. Use objects to perform calculations.
3. Explain the difference between data classes in R.
4. Use R to identify the class of specific data.
5. Define NA in R.
6. Determine whether or not data have NA values.

Working Directories and Paths

Before we get started with R, it is important to understand the concept of working directories and paths. A working directory is the folder on your computer where R will look for files to read in and where it will save files that you create. You can check your current working directory using the getwd() function.

getwd() #checks your current working directory
## [1] "/Users/switters/Desktop/datascibiol/lab2"

What if you find that you are not in the correct working directory? You can change your working directory in two ways. The first is to look at the Session menu at the top of RStudio. Click on Session -> Set Working Directory -> Choose Directory. Then navigate to the folder you want to use as your working directory.

The second way is to use the setwd() function. You will need to provide the full path to the folder you want to use. For example:

#setwd("/Users/yourname/Documents/yourfolder") #uncomment and change to your path

What is a path? A path is the location of a file or folder on your computer. It tells R where to find files to read in or where to save files you create. Paths can be absolute or relative. An absolute path provides the full location of a file or folder, starting from the root directory. A relative path provides the location of a file or folder in relation to the current working directory.

Objects

In order to access the potential of R we need to assign values or other types of data to objects. There is a specific format that I want you to follow, so please pay close attention.

Assign a value to object ‘x’. The ‘<-’ symbol is read as ‘gets’. In this case, x gets 42. Make sure that you are in the environment panel and you should see the value associated with ‘x’. On a mac, you can push option and - to automatically generate the gets symbol.

x <- 42

To print the object to the screen, just type x.

x
## [1] 42

Assign a value of 30 to a new object y.

y <- 30

The = symbol works, but is not a convention followed by most programmers.

z=10 #do not use

Once objects have been created, you can do things with them.

x+y
## [1] 72
x/y
## [1] 1.4

Make two new objects, treatment and control. The value of treatment is 36 and the value of control is 38.

treatment <- 36
control <- 38

What is the sum of treatment and control?

treatment+control
## [1] 74

Here we make a new object my_experiment that is the sum of the treatment and control. Notice that I use _ and not spaces.

my_experiment <- treatment+control
my_experiment
## [1] 74

We can also use the function sum to do the same thing. Notice that if I give a new object the same name as an existing object, the old one is replaced.

my_experiment <- sum(treatment, control)
my_experiment
## [1] 74

To learn more about the sum function, use the help command ?.

?sum

Nomenclature

We need to be careful about nomenclature when we write code. R allows us to give almost any name we want to an object, but there are exceptions. For example, we don’t want to give a name to an object that is the same as a function in R.

#else <- 12

We get an error here because else is a function in R. You also don’t want to give names that might get confused with functions; i.e. you can assign a value to ‘mean’ but this could become confusing because mean is also a function.

mean <- 20

What is the mean of 2+8+2? (Do the math in your head first). Does the following code match your prediction?

2+8+2/3
## [1] 10.66667

Order of operations applies, so we need to tell R exactly what we want.

(2+8+2)/3
## [1] 4

Here we use the mean function. Notice that we use c which stands for concatenate. This combines the three numbers into a single object that the mean function can use. This type of data structure is called a vector.

mean(c(2, 8, 2))
## [1] 4

Why does this not work?

mean(2, 8, 2)
## [1] 2

Practice

  1. Create three new objects, venom_GT, chiron, and veyron. These are the fastest cars in the world. Assign each car to its top speed. The venom_GT can go 270, chiron is 261, and veyron is 268.
venom_GT <- 270
chiron <- 261
veyron <- 268
  1. Use arithmetic to calculate the mean top speed for the cars.
mean_speed <- (venom_GT + chiron + veyron) / 3
mean_speed
## [1] 266.3333
  1. Use the function mean() to calculate the mean top speed for the cars.
mean(c(venom_GT, chiron, veyron))
## [1] 266.3333

Types of Data

There are four frequently used classes of data: 1. numeric, 2. integer, 3. character, 4. logical.

my_numeric <- 42
my_integer <- 2L #adding an L automatically denotes an integer
my_character <- "universe"
my_logical <- FALSE

To find out what type of data you are working with, use the class() function. This is important because sometimes we will need to change the type of data to perform certain analyses.

class(my_numeric)
## [1] "numeric"
class(my_integer)
## [1] "integer"

You can use the is() and as() functions to clarify or specify a type of data.

is.integer(my_numeric) #is my_numeric an integer?
## [1] FALSE

Let’s convert my_numeric to an integer.

my_numeric <- 
  as.integer(my_numeric) #create a new object specified as an integer
is.integer(my_integer) #is my_numeric an integer?
## [1] TRUE

Missing Data

R has a special way to designate missing data, the NA. NA values in R have specific properties which are very useful if your data contains any missing values. Later this quarter we will have a lab focused on dealing with NAs.

NA values are used to designate missing data. is.na or anyNA are useful functions when dealing with NAs in data.

my_missing <- NA
is.na(my_missing)
## [1] TRUE
anyNA(my_missing)
## [1] TRUE

Practice

  1. Let’s create a vector that includes some missing data. For now, run the following code chunk.
new_vector <- c(7, 6.2, 5, 9, NA, 4, 9.8, 7, 3, 2)
  1. Calculate the mean of new_vector.
mean(new_vector)
## [1] NA
  1. How do you interpret this result? What does this mean about NAs? NA’s are not included in the calculation of the mean.

  2. Recalculate the mean using the following code chunk. Why is the useful?

mean(new_vector, na.rm = TRUE) #removes NA values
## [1] 5.888889

–>Home