At the end of this exercise, you will be able to:
1. Define an object in R.
2. Use objects to perform calculations.
3. Explain the difference between data classes in R.
4. Use R to identify the class of specific data.
5. Define NA in R.
6. Determine whether or not data have NA values.
In order to access the potential of R we need to assign values or
other types of data to objects
. There is a specific format
that I want you to follow, so please pay close attention.
Assign a value to object ‘x’. The ‘<-’ symbol is read as ‘gets’.
In this case, x gets 42. Make sure that you are in the environment panel
and you should see the value associated with ‘x’. On a mac, you can push
option
and -
to automatically generate the
gets symbol.
x <- 42
To print the object to the screen, just type x.
x
## [1] 42
Assign a value of 30 to a new object y.
y <- 30
The = symbol works, but is not a convention followed by most programmers.
z=10 #do not use
Once objects have been created, you can do things with them.
x+y
## [1] 72
Make two new objects, treatment and control. The value of treatment is 36 and the value of control is 38.
treatment <- 36
control <- 38
What is the sum of treatment and control?
treatment+control
## [1] 74
Here we make a new object my_experiment
that is the sum
of the treatment and control. Notice that I use _
and not
spaces.
my_experiment <- treatment+control
my_experiment
## [1] 74
We can also use the function sum to do the same thing. Notice that if I give a new object the same name as an existing object, the old one is replaced.
my_experiment <- sum(treatment, control)
my_experiment
## [1] 74
We need to be careful about nomenclature when we write code. R allows us to give almost any name we want to an object, but there are exceptions. For example, we don’t want to give a name to an object that is the same as a function in R.
else <- 12
We get an error here because else
is a function in R.
You also don’t want to give names that might get confused with
functions; i.e. you can assign a value to ‘mean’ but this could become
confusing because mean is used as a function.
mean <- 20
What is the mean of 2+8+2? (Do the math in your head first). Does the following code match your prediction?
2+8+2/3 #this is not correct because R is not following the order of operations
## [1] 10.66667
Order of operations applies, so we need to tell R exactly what we want.
(2+8+2)/3
## [1] 4
Here we use the mean function. Notice that we had to use
c
which stands for concatenate. More on this later.
mean(c(2, 8, 2))
## [1] 4
Create three new objects, venom_GT
,
chiron
, and veyron
. These are the fastest cars
in the world. Assign each car to its top speed. The venom_GT can go 270,
chiron is 261, and veyron is 268.
Use arithmetic to calculate the mean top speed for the cars.
Use the function mean()
to calculate the mean top
speed for the cars.
There are four frequently used classes
of data: 1.
numeric, 2. integer, 3. character, 4. logical.
my_numeric <- 42
my_integer <- 2L #adding an L automatically denotes an integer
my_character <- "universe"
my_logical <- FALSE
To find out what type of data you are working with, use the
class()
function. This is important because sometimes we
will need to change the type of data to perform certain analyses.
class(my_numeric)
## [1] "numeric"
class(my_integer)
## [1] "integer"
You can use the is()
and as()
functions to
clarify or specify a type of data.
is.integer(my_numeric) #is my_numeric an integer?
## [1] FALSE
my_integer <-
as.integer(my_numeric) #create a new object specified as an integer
is.integer(my_integer) #is my_numeric an integer?
## [1] TRUE
R has a special way to designate missing data, the NA. NA values in R have specific properties which are very useful if your data contains any missing values. Later this quarter we will have a lab focused on dealing with NAs.
NA values are used to designate missing data. is.na
or
anyNA
are useful functions when dealing with NAs in
data.
my_missing <- NA
is.na(my_missing)
## [1] TRUE
anyNA(my_missing)
## [1] TRUE
new_vector <- c(7, 6.2, 5, 9, NA, 4, 9.8, 7, 3, 2)
new_vector
.mean(new_vector)
## [1] NA
How do you interpret this result? What does this mean about NAs? NA’s are not included in the calculation of the mean.
Recalculate the mean using the following code chunk. Why is the useful?
mean(new_vector, na.rm = TRUE) #removes NA values
## [1] 5.888889
–>Home