Instructions

Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code and push your final work to your repository. Your final lab report should be organized, clean, and run free from errors. Remember, you must remove the # for the included code chunks to run. Be sure to add your name to the author header above.

Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!

Background

In the data folder, you will find data about shark incidents in California between 1950-2022. The data are from: State of California- Shark Incident Database.

Load the libraries

library("tidyverse")
library("janitor")

Load the data

Run the following code chunk to import the data. In this chunk, I am removing NA’s.

sharks <- 
  read_csv("data/SharkIncidents_1950_2022_220302.csv", 
           na=c("NA", "N/A", "Unknown", "NOT COUNTED", "", " ")) %>% 
  clean_names()

Questions

  1. Are there any “hotspots” for shark incidents in California? Show the top five counties with the most incidents.
sharks %>% 
  count(county) %>% 
  arrange(desc(n)) %>% 
  slice_head(n=5)
## # A tibble: 5 × 2
##   county            n
##   <chr>         <int>
## 1 San Diego        25
## 2 Humboldt         20
## 3 San Mateo        19
## 4 Santa Barbara    19
## 5 Marin            16
  1. Make a plot that shows the total number of incidents per county.
sharks %>% 
  group_by(county) %>%
  summarise(n = n()) %>%
  ggplot(aes(x=reorder(county, n), y=n)) +
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by County (1950-2022)", 
       x=NULL, 
       y="n") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))+
  geom_text(aes(label = n), vjust = -0.2, size = 3, color = "black")

  1. Are there months of the year when incidents are more likely to occur? Make a plot that shows the total number of incidents by month. Which month has the highest number of incidents?

October

sharks %>% 
  group_by(month) %>%
  summarise(total=n()) %>% 
  arrange(-total)
## # A tibble: 12 × 2
##    month total
##    <dbl> <int>
##  1    10    36
##  2     9    34
##  3     8    31
##  4     7    27
##  5     5    16
##  6    11    16
##  7     6    14
##  8    12    13
##  9     1     8
## 10     4     7
## 11     3     5
## 12     2     4
sharks %>% 
  group_by(month) %>%
  summarise(total=n()) %>% 
  ggplot(aes(x=as_factor(month), y=total))+
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by Month", 
       x="Month", 
       y="n")+
  theme(plot.title = element_text(size = 14, face="bold"))

  1. Is there a time of day when most attacks occur? Make a plot that shows the total number of incidents by time of day. Which time of day has the highest number of incidents?

1100

sharks %>% 
  mutate(time=str_remove(time, ":")) %>% 
  filter(!time %in% c("afternoon", "early am", NA)) %>%
  mutate(time=as.numeric(time)) %>%
  count(time) %>% 
  arrange(time) %>% 
  ggplot(aes(x=time, y=n))+
  geom_col()+
  scale_x_continuous(breaks=seq(0, 2400, by=100))+
  theme(axis.text.x = element_text(angle = 60, hjust = 0.85))+
  labs(
    x = "Time of day",
    y = "Number of incidents",
    title = "Shark incidents by time of day"
  ) 

  1. How do the number and types of injuries compare by county? Make a table that shows the number of injury types by county. Which county has the highest number incidents? (hint: your table should be in wide format)

San Diego

sharks %>% 
  select(county, injury) %>% 
  group_by(county, injury) %>%
  summarise(total=n(), .groups='keep') %>% 
  pivot_wider(names_from = injury, values_from = total)%>% 
  mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>% 
  arrange(desc(total))
## # A tibble: 22 × 9
## # Groups:   county [22]
##    county          minor  none major `minor*` `none*` fatal `major*` total
##    <chr>           <int> <int> <int>    <int>   <int> <int>    <int> <int>
##  1 San Diego           8     9     4        1       1     2       NA    23
##  2 Santa Barbara       6     9     2       NA      NA     2       NA    19
##  3 Humboldt            2     9     7        1       1    NA       NA    18
##  4 San Mateo           4    12     1        1      NA     1       NA    18
##  5 Marin               4     3     9       NA      NA    NA       NA    16
##  6 Monterey            2     3     8       NA      NA     2        1    15
##  7 Santa Cruz          3     8     3       NA      NA     1       NA    15
##  8 Sonoma              1     6     8       NA      NA    NA       NA    15
##  9 San Luis Obispo     1     7     3       NA      NA     3       NA    14
## 10 Los Angeles         6     2    NA       NA      NA     1        1     9
## # ℹ 12 more rows
  1. Use the table from #5 to make a plot that shows the total number of incidents by county.
sharks %>% 
  select(county, injury) %>% 
  group_by(county, injury) %>%
  summarise(total=n(), .groups='keep') %>% 
  pivot_wider(names_from = injury, values_from = total)%>% 
  mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>% 
  arrange(desc(total)) %>% 
  ggplot(aes(x=reorder(county, total), y=total))+
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by County", 
       x=NULL, 
       y="n")+
  coord_flip()

  1. In the data, mode refers to a type of activity. Which activity is associated with the highest number of incidents?
sharks %>% 
  count(mode) %>% 
  arrange(desc(n))
## # A tibble: 8 × 2
##   mode                    n
##   <chr>               <int>
## 1 Surfing / Boarding     85
## 2 Freediving             36
## 3 Kayaking / Canoeing    30
## 4 Swimming               22
## 5 Scuba Diving           20
## 6 Hookah Diving          10
## 7 Paddleboarding          7
## 8 Walking in shallow      1
  1. Make a plot that compares the number of incidents by activity.
sharks %>% 
  ggplot(aes(x=mode, fill=mode))+
  geom_bar(alpha=0.8, position="dodge")+
  labs(title="Incidents by Activity", 
       x=NULL, 
       y="Number of Incidents")+
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))

  1. Which shark species is involved in the highest number of incidents?
    Great White
sharks %>% 
  count(species) %>% 
  arrange(desc(n))
## # A tibble: 12 × 2
##    species          n
##    <chr>        <int>
##  1 White          181
##  2 <NA>            14
##  3 Hammerhead       3
##  4 Blue             2
##  5 Killer Whale     2
##  6 Leopard          2
##  7 Salmon           2
##  8 Blue*            1
##  9 Mako             1
## 10 Sevengill        1
## 11 Thresher         1
## 12 blue             1
  1. Are all incidents involving Great White’s fatal? Make a plot that shows the number and types of incidents for Great White’s only.
    No, the largest number of incidents have no injury
sharks %>% 
  filter(species=="White") %>% 
  ggplot(aes(x=injury))+
  geom_bar(fill = "#0099f9", alpha=0.8)+
  labs(title="Incidents Involving Great White Sharks", 
       x="Injury", 
       y="n")+
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))

Knit and Upload

Please knit your work as an .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!