Instructions

Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code and push your final work to your repository. Your final lab report should be organized, clean, and run free from errors. Remember, you must remove the # for the included code chunks to run. Be sure to add your name to the author header above.

Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!

Load the libraries

library(tidyverse)
library(janitor)

Data

For this homework, we will use a data set compiled by the Office of Environment and Heritage in New South Whales, Australia. It contains the enterococci counts in water samples obtained from Sydney beaches as part of the Beachwatch Water Quality Program. Enterococci are bacteria common in the intestines of mammals; they are rarely present in clean water. So, Enterococci values are a measurement of pollution. cfu stands for colony forming units and measures the number of viable bacteria in a sample cfu.

This homework loosely follows the tutorial of R Ladies Sydney. If you get stuck, check it out!

  1. Start by loading the data sydneybeaches. Do some exploratory analysis to get an idea of the data structure.

  2. Are these data “tidy” per the definitions of the tidyverse? How do you know? Are they in wide or long format?

  3. We are only interested in the variables site, date, and enterococci_cfu_100ml. Make a new object focused on these variables only. Name the object sydneybeaches_long

  4. Pivot the data such that the dates are column names and each beach only appears once (wide format). Name the object sydneybeaches_wide

  5. Pivot the data back so that the dates are data and not column names.

  6. We haven’t dealt much with dates yet, but separate the date into columns day, month, and year. Do this on the sydneybeaches_long data.

  7. What is the average enterococci_cfu_100ml by year for each beach. Think about which data you will use- long or wide.

  8. Make the output from question 7 easier to read by pivoting it to wide format.

  9. What was the most polluted beach in 2013?

  10. Explore the data! Do one analysis of your choice that includes a minimum of three lines of code.

Knit and Upload

Please knit your work as an .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!