At the end of this exercise, you will be able to:
1. Define an object in R.
2. Use objects to perform calculations.
3. Explain the difference between data classes in R.
4. Use R to identify the class of specific data.
5. Define NA in R.
6. Determine whether or not data have NA values.
Before we get started with R, it is important to understand the
concept of working directories and paths. A working directory is the
folder on your computer where R will look for files to read in and where
it will save files that you create. You can check your current working
directory using the getwd() function.
getwd() #checks your current working directory
## [1] "/Users/switters/Desktop/datascibiol/lab2"
What if you find that you are not in the correct working directory? You can change your working directory in two ways. The first is to look at the Session menu at the top of RStudio. Click on Session -> Set Working Directory -> Choose Directory. Then navigate to the folder you want to use as your working directory.
The second way is to use the setwd() function. You will
need to provide the full path to the folder you want to use. For
example:
#setwd("/Users/yourname/Documents/yourfolder") #uncomment and change to your path
What is a path? A path is the location of a file or folder on your computer. It tells R where to find files to read in or where to save files you create. Paths can be absolute or relative. An absolute path provides the full location of a file or folder, starting from the root directory. A relative path provides the location of a file or folder in relation to the current working directory.
In order to access the potential of R we need to assign values or
other types of data to objects. There is a specific format
that I want you to follow, so please pay close attention.
Assign a value to object ‘x’. The ‘<-’ symbol is read as ‘gets’.
In this case, x gets 42. Make sure that you are in the environment panel
and you should see the value associated with ‘x’. On a mac, you can push
option and - to automatically generate the
gets symbol.
x <- 42
To print the object to the screen, just type x.
x
## [1] 42
Assign a value of 30 to a new object y.
y <- 30
The = symbol works, but is not a convention followed by most programmers.
z=10 #do not use
Once objects have been created, you can do things with them.
x+y
## [1] 72
x/y
## [1] 1.4
Make two new objects, treatment and control. The value of treatment is 36 and the value of control is 38.
treatment <- 36
control <- 38
What is the sum of treatment and control?
treatment+control
## [1] 74
Here we make a new object my_experiment that is the sum
of the treatment and control. Notice that I use _ and not
spaces.
my_experiment <- treatment+control
my_experiment
## [1] 74
We can also use the function sum to do the same thing. Notice that if I give a new object the same name as an existing object, the old one is replaced.
my_experiment <- sum(treatment, control)
my_experiment
## [1] 74
To learn more about the sum function, use the help command
?.
?sum
We need to be careful about nomenclature when we write code. R allows us to give almost any name we want to an object, but there are exceptions. For example, we don’t want to give a name to an object that is the same as a function in R.
#else <- 12
We get an error here because else is a function in R.
You also don’t want to give names that might get confused with
functions; i.e. you can assign a value to ‘mean’ but this could become
confusing because mean is also a function.
mean <- 20
What is the mean of 2+8+2? (Do the math in your head first). Does the following code match your prediction?
2+8+2/3
## [1] 10.66667
Order of operations applies, so we need to tell R exactly what we want.
(2+8+2)/3
## [1] 4
Here we use the mean function. Notice that we use c
which stands for concatenate. This combines the three numbers into a
single object that the mean function can use. This type of data
structure is called a vector.
mean(c(2, 8, 2))
## [1] 4
Why does this not work?
mean(2, 8, 2)
## [1] 2
venom_GT,
chiron, and veyron. These are the fastest cars
in the world. Assign each car to its top speed. The venom_GT can go 270,
chiron is 261, and veyron is 268.venom_GT <- 270
chiron <- 261
veyron <- 268
mean_speed <- (venom_GT + chiron + veyron) / 3
mean_speed
## [1] 266.3333
mean() to calculate the mean top speed
for the cars.mean(c(venom_GT, chiron, veyron))
## [1] 266.3333
There are four frequently used classes of data: 1.
numeric, 2. integer, 3. character, 4. logical.
my_numeric <- 42
my_integer <- 2L #adding an L automatically denotes an integer
my_character <- "universe"
my_logical <- FALSE
To find out what type of data you are working with, use the
class() function. This is important because sometimes we
will need to change the type of data to perform certain analyses.
class(my_numeric)
## [1] "numeric"
class(my_integer)
## [1] "integer"
You can use the is() and as() functions to
clarify or specify a type of data.
is.integer(my_numeric) #is my_numeric an integer?
## [1] FALSE
Let’s convert my_numeric to an integer.
my_numeric <-
as.integer(my_numeric) #create a new object specified as an integer
is.integer(my_integer) #is my_numeric an integer?
## [1] TRUE
R has a special way to designate missing data, the NA. NA values in R have specific properties which are very useful if your data contains any missing values. Later this quarter we will have a lab focused on dealing with NAs.
NA values are used to designate missing data. is.na or
anyNA are useful functions when dealing with NAs in
data.
my_missing <- NA
is.na(my_missing)
## [1] TRUE
anyNA(my_missing)
## [1] TRUE
new_vector <- c(7, 6.2, 5, 9, NA, 4, 9.8, 7, 3, 2)
new_vector.mean(new_vector)
## [1] NA
How do you interpret this result? What does this mean about NAs? NA’s are not included in the calculation of the mean.
Recalculate the mean using the following code chunk. Why is the useful?
mean(new_vector, na.rm = TRUE) #removes NA values
## [1] 5.888889
–>Home