Instructions

Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code and push your final work to your repository. Your final lab report should be organized, clean, and run free from errors. Remember, you must remove the # for the included code chunks to run. Be sure to add your name to the author header above.

Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!

Background

In the data folder, you will find data about shark incidents in California between 1950-2022. The data are from: State of California- Shark Incident Database.

Load the libraries

library("tidyverse")
library("janitor")
library("naniar")

Load the data

Run the following code chunk to import the data.

sharks <- read_csv("data/SharkIncidents_1950_2022_220302.csv") %>% clean_names()

Questions

  1. Start by doing some data exploration using your preferred function(s). What is the structure of the data? Where are the missing values and how are they represented?
glimpse(sharks)
## Rows: 211
## Columns: 16
## $ incident_num     <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "1…
## $ month            <dbl> 10, 5, 12, 2, 8, 4, 10, 5, 6, 7, 10, 11, 4, 5, 5, 8, …
## $ day              <dbl> 8, 27, 7, 6, 14, 28, 12, 7, 14, 28, 4, 10, 24, 19, 21…
## $ year             <dbl> 1950, 1952, 1952, 1955, 1956, 1957, 1958, 1959, 1959,…
## $ time             <chr> "12:00", "14:00", "14:00", "12:00", "16:30", "13:30",…
## $ county           <chr> "San Diego", "San Diego", "Monterey", "Monterey", "Sa…
## $ location         <chr> "Imperial Beach", "Imperial Beach", "Lovers Point", "…
## $ mode             <chr> "Swimming", "Swimming", "Swimming", "Freediving", "Sw…
## $ injury           <chr> "major", "minor", "fatal", "minor", "major", "fatal",…
## $ depth            <chr> "surface", "surface", "surface", "surface", "surface"…
## $ species          <chr> "White", "White", "White", "White", "White", "White",…
## $ comment          <chr> "Body Surfing, bit multiple times on leg, thigh and b…
## $ longitude        <chr> "-117.1466667", "-117.2466667", "-122.05", "-122.15",…
## $ latitude         <dbl> 32.58833, 32.58833, 36.62667, 36.62667, 35.13833, 35.…
## $ confirmed_source <chr> "Miller/Collier, Coronado Paper, Oceanside Paper", "G…
## $ wfl_case_number  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
miss_var_summary(sharks)
## # A tibble: 16 × 3
##    variable         n_miss pct_miss
##    <chr>             <int>    <num>
##  1 wfl_case_number     202   95.7  
##  2 time                  7    3.32 
##  3 latitude              6    2.84 
##  4 longitude             5    2.37 
##  5 confirmed_source      1    0.474
##  6 incident_num          0    0    
##  7 month                 0    0    
##  8 day                   0    0    
##  9 year                  0    0    
## 10 county                0    0    
## 11 location              0    0    
## 12 mode                  0    0    
## 13 injury                0    0    
## 14 depth                 0    0    
## 15 species               0    0    
## 16 comment               0    0
  1. Notice that there are some incidents identified as “NOT COUNTED”. These should be removed from the data because they were either not sharks, unverified, or were provoked. It’s OK to replace the sharks object.
sharks %>%
  select(incident_num, injury, comment) %>%
  filter(incident_num == "NOT COUNTED")
## # A tibble: 9 × 3
##   incident_num injury comment                                                   
##   <chr>        <chr>  <chr>                                                     
## 1 NOT COUNTED  minor* "Free diver attacked at surface, dragged under with minor…
## 2 NOT COUNTED  major* "Surfer attacked with major injuries to legs - This is wi…
## 3 NOT COUNTED  major* "Filmaker was intentionally filming blue sharks when one …
## 4 NOT COUNTED  none*  "Shark grabbed surfboard leash with no injury to surfer. …
## 5 NOT COUNTED  major* "Something slammed into board/person causing back/neck in…
## 6 NOT COUNTED  none*  "Unclear that shark hit board or board hit shark. In surf…
## 7 NOT COUNTED  minor* "Woman wading while trying to take photo of dying salmon …
## 8 NOT COUNTED  none*  "Angler claims white shark hit kayak - no other data foun…
## 9 NOT COUNTED  minor* "Surfer claims to have been circled by shark, then possib…
sharks <- sharks %>% 
  filter(incident_num != "NOT COUNTED")
sharks %>% 
  filter(!str_detect(incident_num, "NOT COUNTED"))
## # A tibble: 202 × 16
##    incident_num month   day  year time    county     location mode  injury depth
##    <chr>        <dbl> <dbl> <dbl> <chr>   <chr>      <chr>    <chr> <chr>  <chr>
##  1 1               10     8  1950 12:00   San Diego  Imperia… Swim… major  surf…
##  2 2                5    27  1952 14:00   San Diego  Imperia… Swim… minor  surf…
##  3 3               12     7  1952 14:00   Monterey   Lovers … Swim… fatal  surf…
##  4 4                2     6  1955 12:00   Monterey   Pacific… Free… minor  surf…
##  5 5                8    14  1956 16:30   San Luis … Pismo B… Swim… major  surf…
##  6 6                4    28  1957 13:30   San Luis … Morro B… Swim… fatal  surf…
##  7 7               10    12  1958 Unknown San Diego  Coronad… Swim… major  surf…
##  8 8                5     7  1959 17:30   San Franc… Baker B… Swim… fatal  surf…
##  9 9                6    14  1959 17:00   San Diego  La Jolla Free… fatal  surf…
## 10 10               7    28  1959 19:30   San Diego  La Jolla Free… minor  surf…
## # ℹ 192 more rows
## # ℹ 6 more variables: species <chr>, comment <chr>, longitude <chr>,
## #   latitude <dbl>, confirmed_source <chr>, wfl_case_number <chr>
  1. Are there any “hotspots” for shark incidents in California? Make a table and plot that shows the total number of incidents per county. Which county has the highest number of incidents?

San Diego County has the highest number of incidents.

sharks %>% 
  count(county) %>% 
  arrange(desc(n))
## # A tibble: 21 × 2
##    county              n
##    <chr>           <int>
##  1 San Diego          23
##  2 Santa Barbara      19
##  3 Humboldt           18
##  4 San Mateo          18
##  5 Marin              16
##  6 Monterey           15
##  7 Santa Cruz         15
##  8 Sonoma             15
##  9 San Luis Obispo    14
## 10 Los Angeles         9
## # ℹ 11 more rows
sharks %>% 
  group_by(county) %>%
  summarise(n = n()) %>%
  ggplot(aes(x=reorder(county, n), y=n)) +
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by County (1950-2022)", 
       x=NULL, 
       y="n") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))+
  geom_text(aes(label = n), vjust = -0.2, size = 3, color = "black")

  1. Are there months of the year when incidents are more likely to occur? Make a table and a plot that shows the total number of incidents by month. Which month has the highest number of incidents?

October

sharks %>% 
  group_by(month) %>%
  summarise(total=n()) %>% 
  arrange(-total)
## # A tibble: 12 × 2
##    month total
##    <dbl> <int>
##  1    10    36
##  2     8    31
##  3     9    31
##  4     7    23
##  5     5    16
##  6    11    16
##  7     6    14
##  8    12    13
##  9     1     7
## 10     4     6
## 11     3     5
## 12     2     4
sharks %>% 
  group_by(month) %>%
  summarise(total=n()) %>% 
  ggplot(aes(x=as_factor(month), y=total))+
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by Month", 
       x="Month", 
       y="n")+
  theme(plot.title = element_text(size = 14, face="bold"))

  1. How do the number and types of injuries compare by county? Make a table that shows the number of injury types by county. Which county has the highest number incidents?

San Diego

sharks %>% 
  select(county, injury) %>% 
  group_by(county, injury) %>%
  summarise(total=n(), .groups='keep') %>% 
  pivot_wider(names_from = injury, values_from = total)%>% 
  mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>% 
  arrange(desc(total))
## # A tibble: 21 × 6
## # Groups:   county [21]
##    county          minor  none major fatal total
##    <chr>           <int> <int> <int> <int> <int>
##  1 San Diego           8     9     4     2    23
##  2 Santa Barbara       6     9     2     2    19
##  3 Humboldt            2     9     7    NA    18
##  4 San Mateo           4    12     1     1    18
##  5 Marin               4     3     9    NA    16
##  6 Monterey            2     3     8     2    15
##  7 Santa Cruz          3     8     3     1    15
##  8 Sonoma              1     6     8    NA    15
##  9 San Luis Obispo     1     7     3     3    14
## 10 Los Angeles         6     2    NA     1     9
## # ℹ 11 more rows
  1. Use the table from #5 to make a plot that shows the total number of incidents by county.
sharks %>% 
  select(county, injury) %>% 
  group_by(county, injury) %>%
  summarise(total=n(), .groups='keep') %>% 
  pivot_wider(names_from = injury, values_from = total)%>% 
  mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>% 
  arrange(desc(total)) %>% 
  ggplot(aes(x=reorder(county, total), y=total))+
  geom_col(fill = "#0099f9", alpha=0.8)+
  labs(title="Shark Incidents by County", 
       x=NULL, 
       y="n")+
  coord_flip()

  1. In the data, mode refers to a type of activity. Which activity is associated with the highest number of incidents?
sharks %>% 
  count(mode) %>% 
  arrange(desc(n))
## # A tibble: 7 × 2
##   mode                    n
##   <chr>               <int>
## 1 Surfing / Boarding     80
## 2 Freediving             35
## 3 Kayaking / Canoeing    29
## 4 Swimming               22
## 5 Scuba Diving           19
## 6 Hookah Diving          10
## 7 Paddleboarding          7
  1. Make a plot that compares the number of incidents by activity.
sharks %>% 
  ggplot(aes(x=mode, fill=mode))+
  geom_bar(alpha=0.8, position="dodge")+
  labs(title="Incidents by Activity", 
       x=NULL, 
       y="Number of Incidents")+
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))

  1. Which shark species is involved in the highest number of incidents?
    Great White
sharks %>% 
  count(species) %>% 
  arrange(desc(n))
## # A tibble: 8 × 2
##   species        n
##   <chr>      <int>
## 1 White        179
## 2 Unknown       13
## 3 Hammerhead     3
## 4 Blue           2
## 5 Leopard        2
## 6 Salmon         1
## 7 Sevengill      1
## 8 Thresher       1
  1. Are all incidents involving Great White’s fatal? Make a plot that shows the number and types of incidents for Great White’s only.
    No, the largest number of incidents have no injury
sharks %>% 
  filter(species=="White") %>% 
  ggplot(aes(x=injury))+
  geom_bar(fill = "#0099f9", alpha=0.8)+
  labs(title="Incidents Involving Great White Sharks", 
       x="Injury", 
       y="n")+
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        plot.title = element_text(size = 14, face="bold"))

Knit and Upload

Please knit your work as a .pdf or .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!