Answer the following questions and complete the exercises in
RMarkdown. Please embed all of your code and push your final work to
your repository. Your final lab report should be organized, clean, and
run free from errors. Remember, you must remove the #
for
the included code chunks to run. Be sure to add your name to the author
header above.
Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!
In the data
folder, you will find data about shark
incidents in California between 1950-2022. The data
are from: State of California- Shark Incident Database.
library("tidyverse")
library("janitor")
library("naniar")
Run the following code chunk to import the data.
sharks <- read_csv("data/SharkIncidents_1950_2022_220302.csv") %>% clean_names()
glimpse(sharks)
## Rows: 211
## Columns: 16
## $ incident_num <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "1…
## $ month <dbl> 10, 5, 12, 2, 8, 4, 10, 5, 6, 7, 10, 11, 4, 5, 5, 8, …
## $ day <dbl> 8, 27, 7, 6, 14, 28, 12, 7, 14, 28, 4, 10, 24, 19, 21…
## $ year <dbl> 1950, 1952, 1952, 1955, 1956, 1957, 1958, 1959, 1959,…
## $ time <chr> "12:00", "14:00", "14:00", "12:00", "16:30", "13:30",…
## $ county <chr> "San Diego", "San Diego", "Monterey", "Monterey", "Sa…
## $ location <chr> "Imperial Beach", "Imperial Beach", "Lovers Point", "…
## $ mode <chr> "Swimming", "Swimming", "Swimming", "Freediving", "Sw…
## $ injury <chr> "major", "minor", "fatal", "minor", "major", "fatal",…
## $ depth <chr> "surface", "surface", "surface", "surface", "surface"…
## $ species <chr> "White", "White", "White", "White", "White", "White",…
## $ comment <chr> "Body Surfing, bit multiple times on leg, thigh and b…
## $ longitude <chr> "-117.1466667", "-117.2466667", "-122.05", "-122.15",…
## $ latitude <dbl> 32.58833, 32.58833, 36.62667, 36.62667, 35.13833, 35.…
## $ confirmed_source <chr> "Miller/Collier, Coronado Paper, Oceanside Paper", "G…
## $ wfl_case_number <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
miss_var_summary(sharks)
## # A tibble: 16 × 3
## variable n_miss pct_miss
## <chr> <int> <num>
## 1 wfl_case_number 202 95.7
## 2 time 7 3.32
## 3 latitude 6 2.84
## 4 longitude 5 2.37
## 5 confirmed_source 1 0.474
## 6 incident_num 0 0
## 7 month 0 0
## 8 day 0 0
## 9 year 0 0
## 10 county 0 0
## 11 location 0 0
## 12 mode 0 0
## 13 injury 0 0
## 14 depth 0 0
## 15 species 0 0
## 16 comment 0 0
sharks
object.sharks %>%
select(incident_num, injury, comment) %>%
filter(incident_num == "NOT COUNTED")
## # A tibble: 9 × 3
## incident_num injury comment
## <chr> <chr> <chr>
## 1 NOT COUNTED minor* "Free diver attacked at surface, dragged under with minor…
## 2 NOT COUNTED major* "Surfer attacked with major injuries to legs - This is wi…
## 3 NOT COUNTED major* "Filmaker was intentionally filming blue sharks when one …
## 4 NOT COUNTED none* "Shark grabbed surfboard leash with no injury to surfer. …
## 5 NOT COUNTED major* "Something slammed into board/person causing back/neck in…
## 6 NOT COUNTED none* "Unclear that shark hit board or board hit shark. In surf…
## 7 NOT COUNTED minor* "Woman wading while trying to take photo of dying salmon …
## 8 NOT COUNTED none* "Angler claims white shark hit kayak - no other data foun…
## 9 NOT COUNTED minor* "Surfer claims to have been circled by shark, then possib…
sharks <- sharks %>%
filter(incident_num != "NOT COUNTED")
sharks %>%
filter(!str_detect(incident_num, "NOT COUNTED"))
## # A tibble: 202 × 16
## incident_num month day year time county location mode injury depth
## <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 10 8 1950 12:00 San Diego Imperia… Swim… major surf…
## 2 2 5 27 1952 14:00 San Diego Imperia… Swim… minor surf…
## 3 3 12 7 1952 14:00 Monterey Lovers … Swim… fatal surf…
## 4 4 2 6 1955 12:00 Monterey Pacific… Free… minor surf…
## 5 5 8 14 1956 16:30 San Luis … Pismo B… Swim… major surf…
## 6 6 4 28 1957 13:30 San Luis … Morro B… Swim… fatal surf…
## 7 7 10 12 1958 Unknown San Diego Coronad… Swim… major surf…
## 8 8 5 7 1959 17:30 San Franc… Baker B… Swim… fatal surf…
## 9 9 6 14 1959 17:00 San Diego La Jolla Free… fatal surf…
## 10 10 7 28 1959 19:30 San Diego La Jolla Free… minor surf…
## # ℹ 192 more rows
## # ℹ 6 more variables: species <chr>, comment <chr>, longitude <chr>,
## # latitude <dbl>, confirmed_source <chr>, wfl_case_number <chr>
San Diego County has the highest number of incidents.
sharks %>%
count(county) %>%
arrange(desc(n))
## # A tibble: 21 × 2
## county n
## <chr> <int>
## 1 San Diego 23
## 2 Santa Barbara 19
## 3 Humboldt 18
## 4 San Mateo 18
## 5 Marin 16
## 6 Monterey 15
## 7 Santa Cruz 15
## 8 Sonoma 15
## 9 San Luis Obispo 14
## 10 Los Angeles 9
## # ℹ 11 more rows
sharks %>%
group_by(county) %>%
summarise(n = n()) %>%
ggplot(aes(x=reorder(county, n), y=n)) +
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by County (1950-2022)",
x=NULL,
y="n") +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))+
geom_text(aes(label = n), vjust = -0.2, size = 3, color = "black")
October
sharks %>%
group_by(month) %>%
summarise(total=n()) %>%
arrange(-total)
## # A tibble: 12 × 2
## month total
## <dbl> <int>
## 1 10 36
## 2 8 31
## 3 9 31
## 4 7 23
## 5 5 16
## 6 11 16
## 7 6 14
## 8 12 13
## 9 1 7
## 10 4 6
## 11 3 5
## 12 2 4
sharks %>%
group_by(month) %>%
summarise(total=n()) %>%
ggplot(aes(x=as_factor(month), y=total))+
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by Month",
x="Month",
y="n")+
theme(plot.title = element_text(size = 14, face="bold"))
San Diego
sharks %>%
select(county, injury) %>%
group_by(county, injury) %>%
summarise(total=n(), .groups='keep') %>%
pivot_wider(names_from = injury, values_from = total)%>%
mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>%
arrange(desc(total))
## # A tibble: 21 × 6
## # Groups: county [21]
## county minor none major fatal total
## <chr> <int> <int> <int> <int> <int>
## 1 San Diego 8 9 4 2 23
## 2 Santa Barbara 6 9 2 2 19
## 3 Humboldt 2 9 7 NA 18
## 4 San Mateo 4 12 1 1 18
## 5 Marin 4 3 9 NA 16
## 6 Monterey 2 3 8 2 15
## 7 Santa Cruz 3 8 3 1 15
## 8 Sonoma 1 6 8 NA 15
## 9 San Luis Obispo 1 7 3 3 14
## 10 Los Angeles 6 2 NA 1 9
## # ℹ 11 more rows
sharks %>%
select(county, injury) %>%
group_by(county, injury) %>%
summarise(total=n(), .groups='keep') %>%
pivot_wider(names_from = injury, values_from = total)%>%
mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>%
arrange(desc(total)) %>%
ggplot(aes(x=reorder(county, total), y=total))+
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by County",
x=NULL,
y="n")+
coord_flip()
mode
refers to a type of activity. Which
activity is associated with the highest number of incidents?sharks %>%
count(mode) %>%
arrange(desc(n))
## # A tibble: 7 × 2
## mode n
## <chr> <int>
## 1 Surfing / Boarding 80
## 2 Freediving 35
## 3 Kayaking / Canoeing 29
## 4 Swimming 22
## 5 Scuba Diving 19
## 6 Hookah Diving 10
## 7 Paddleboarding 7
sharks %>%
ggplot(aes(x=mode, fill=mode))+
geom_bar(alpha=0.8, position="dodge")+
labs(title="Incidents by Activity",
x=NULL,
y="Number of Incidents")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))
sharks %>%
count(species) %>%
arrange(desc(n))
## # A tibble: 8 × 2
## species n
## <chr> <int>
## 1 White 179
## 2 Unknown 13
## 3 Hammerhead 3
## 4 Blue 2
## 5 Leopard 2
## 6 Salmon 1
## 7 Sevengill 1
## 8 Thresher 1
sharks %>%
filter(species=="White") %>%
ggplot(aes(x=injury))+
geom_bar(fill = "#0099f9", alpha=0.8)+
labs(title="Incidents Involving Great White Sharks",
x="Injury",
y="n")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))
Please knit your work as a .pdf or .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!