Answer the following questions and complete the exercises in
RMarkdown. Please embed all of your code and push your final work to
your repository. Your final lab report should be organized, clean, and
run free from errors. Remember, you must remove the #
for
the included code chunks to run. Be sure to add your name to the author
header above.
Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!
library(tidyverse)
library(janitor)
For this homework, we will use a data set compiled by the Office of
Environment and Heritage in New South Whales, Australia. It contains the
enterococci counts in water samples obtained from Sydney beaches as part
of the Beachwatch Water Quality Program. Enterococci are
bacteria common in the intestines of mammals; they are rarely present in
clean water. So, Enterococci values are a measurement of
pollution. cfu
stands for colony forming units and measures
the number of viable bacteria in a sample cfu.
This homework loosely follows the tutorial of R Ladies Sydney. If you get stuck, check it out!
sydneybeaches
. Do some
exploratory analysis to get an idea of the data structure.sydneybeaches <- read_csv("data/sydneybeaches.csv") %>% clean_names()
## Rows: 3690 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Region, Council, Site, Date
## dbl (4): BeachId, Longitude, Latitude, Enterococci (cfu/100ml)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The data appear to be tidy. Each variable has its own column, each observation has its own row, and each cell has its own value. Because the sites are repeated based on their observation date, the data are in long format.
names(sydneybeaches)
## [1] "beach_id" "region" "council"
## [4] "site" "longitude" "latitude"
## [7] "date" "enterococci_cfu_100ml"
sydneybeaches_long
sydneybeaches_long <- sydneybeaches %>%
select(site, date, enterococci_cfu_100ml)
sydneybeaches_long
## # A tibble: 3,690 × 3
## site date enterococci_cfu_100ml
## <chr> <chr> <dbl>
## 1 Clovelly Beach 02/01/2013 19
## 2 Clovelly Beach 06/01/2013 3
## 3 Clovelly Beach 12/01/2013 2
## 4 Clovelly Beach 18/01/2013 13
## 5 Clovelly Beach 30/01/2013 8
## 6 Clovelly Beach 05/02/2013 7
## 7 Clovelly Beach 11/02/2013 11
## 8 Clovelly Beach 23/02/2013 97
## 9 Clovelly Beach 07/03/2013 3
## 10 Clovelly Beach 25/03/2013 0
## # ℹ 3,680 more rows
sydneybeaches_wide
sydneybeaches_wide <- sydneybeaches_long %>%
pivot_wider(names_from=date,
values_from=enterococci_cfu_100ml)
sydneybeaches_wide
## # A tibble: 11 × 345
## site `02/01/2013` `06/01/2013` `12/01/2013` `18/01/2013` `30/01/2013`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Clovelly Be… 19 3 2 13 8
## 2 Coogee Beach 15 4 17 18 22
## 3 Gordons Bay… NA NA NA NA NA
## 4 Little Bay … 9 3 72 1 44
## 5 Malabar Bea… 2 4 390 15 13
## 6 Maroubra Be… 1 1 20 2 11
## 7 South Marou… 1 0 33 2 13
## 8 South Marou… 12 2 110 13 100
## 9 Bondi Beach 3 1 2 1 6
## 10 Bronte Beach 4 2 38 3 25
## 11 Tamarama Be… 1 0 7 22 23
## # ℹ 339 more variables: `05/02/2013` <dbl>, `11/02/2013` <dbl>,
## # `23/02/2013` <dbl>, `07/03/2013` <dbl>, `25/03/2013` <dbl>,
## # `02/04/2013` <dbl>, `12/04/2013` <dbl>, `18/04/2013` <dbl>,
## # `24/04/2013` <dbl>, `01/05/2013` <dbl>, `20/05/2013` <dbl>,
## # `31/05/2013` <dbl>, `06/06/2013` <dbl>, `12/06/2013` <dbl>,
## # `24/06/2013` <dbl>, `06/07/2013` <dbl>, `18/07/2013` <dbl>,
## # `24/07/2013` <dbl>, `08/08/2013` <dbl>, `22/08/2013` <dbl>, …
sydneybeaches_wide %>%
pivot_longer(-site,
names_to="date",
values_to="enterococci_cfu_100ml"
)
## # A tibble: 3,784 × 3
## site date enterococci_cfu_100ml
## <chr> <chr> <dbl>
## 1 Clovelly Beach 02/01/2013 19
## 2 Clovelly Beach 06/01/2013 3
## 3 Clovelly Beach 12/01/2013 2
## 4 Clovelly Beach 18/01/2013 13
## 5 Clovelly Beach 30/01/2013 8
## 6 Clovelly Beach 05/02/2013 7
## 7 Clovelly Beach 11/02/2013 11
## 8 Clovelly Beach 23/02/2013 97
## 9 Clovelly Beach 07/03/2013 3
## 10 Clovelly Beach 25/03/2013 0
## # ℹ 3,774 more rows
sydneybeaches_long
data.sydneybeaches_long %>%
separate(date, into=c("day", "month", "year"), sep="/")
## # A tibble: 3,690 × 5
## site day month year enterococci_cfu_100ml
## <chr> <chr> <chr> <chr> <dbl>
## 1 Clovelly Beach 02 01 2013 19
## 2 Clovelly Beach 06 01 2013 3
## 3 Clovelly Beach 12 01 2013 2
## 4 Clovelly Beach 18 01 2013 13
## 5 Clovelly Beach 30 01 2013 8
## 6 Clovelly Beach 05 02 2013 7
## 7 Clovelly Beach 11 02 2013 11
## 8 Clovelly Beach 23 02 2013 97
## 9 Clovelly Beach 07 03 2013 3
## 10 Clovelly Beach 25 03 2013 0
## # ℹ 3,680 more rows
enterococci_cfu_100ml
by year for
each beach. Think about which data you will use- long or wide.mean_entero <-
sydneybeaches_long %>%
separate(date, into=c("day", "month", "year"), sep="/") %>%
group_by(site, year) %>%
summarize(mean_enterococci_cfu_100ml=mean(enterococci_cfu_100ml, na.rm=T))
## `summarise()` has grouped output by 'site'. You can override using the
## `.groups` argument.
mean_entero
## # A tibble: 66 × 3
## # Groups: site [11]
## site year mean_enterococci_cfu_100ml
## <chr> <chr> <dbl>
## 1 Bondi Beach 2013 32.2
## 2 Bondi Beach 2014 11.1
## 3 Bondi Beach 2015 14.3
## 4 Bondi Beach 2016 19.4
## 5 Bondi Beach 2017 13.2
## 6 Bondi Beach 2018 22.9
## 7 Bronte Beach 2013 26.8
## 8 Bronte Beach 2014 17.5
## 9 Bronte Beach 2015 23.6
## 10 Bronte Beach 2016 61.3
## # ℹ 56 more rows
mean_entero %>%
pivot_wider(names_from=site,
values_from=mean_enterococci_cfu_100ml)
## # A tibble: 6 × 12
## year `Bondi Beach` `Bronte Beach` `Clovelly Beach` `Coogee Beach`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2013 32.2 26.8 9.28 39.7
## 2 2014 11.1 17.5 13.8 52.6
## 3 2015 14.3 23.6 8.82 40.3
## 4 2016 19.4 61.3 11.3 59.5
## 5 2017 13.2 16.8 7.93 20.7
## 6 2018 22.9 43.4 10.6 21.6
## # ℹ 7 more variables: `Gordons Bay (East)` <dbl>, `Little Bay Beach` <dbl>,
## # `Malabar Beach` <dbl>, `Maroubra Beach` <dbl>,
## # `South Maroubra Beach` <dbl>, `South Maroubra Rockpool` <dbl>,
## # `Tamarama Beach` <dbl>
mean_entero %>%
pivot_wider(names_from=site,
values_from=mean_enterococci_cfu_100ml) %>%
filter(year==2013) %>%
pivot_longer(-year,
names_to = "site",
values_to = "mean_enterococci_cfu_100ml") %>%
arrange(desc(mean_enterococci_cfu_100ml))
## # A tibble: 11 × 3
## year site mean_enterococci_cfu_100ml
## <chr> <chr> <dbl>
## 1 2013 Little Bay Beach 122.
## 2 2013 Malabar Beach 101.
## 3 2013 South Maroubra Rockpool 96.4
## 4 2013 Maroubra Beach 47.1
## 5 2013 Coogee Beach 39.7
## 6 2013 South Maroubra Beach 39.3
## 7 2013 Bondi Beach 32.2
## 8 2013 Tamarama Beach 29.7
## 9 2013 Bronte Beach 26.8
## 10 2013 Gordons Bay (East) 24.8
## 11 2013 Clovelly Beach 9.28
mean_entero %>%
pivot_wider(names_from=site,
values_from=mean_enterococci_cfu_100ml) %>%
filter(year==2013) %>%
pivot_longer(-year,
names_to = "site",
values_to = "mean_enterococci_cfu_100ml") %>%
slice_max(mean_enterococci_cfu_100ml, n=3)
## # A tibble: 3 × 3
## year site mean_enterococci_cfu_100ml
## <chr> <chr> <dbl>
## 1 2013 Little Bay Beach 122.
## 2 2013 Malabar Beach 101.
## 3 2013 South Maroubra Rockpool 96.4
Please be sure that you check the keep md
file in the
knit preferences.