Answer the following questions and complete the exercises in
RMarkdown. Please embed all of your code and push your final work to
your repository. Your final lab report should be organized, clean, and
run free from errors. Remember, you must remove the # for
the included code chunks to run. Be sure to add your name to the author
header above.
Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!
In the data folder, you will find data about shark
incidents in California between 1950-2022. The data
are from: State of California- Shark Incident Database.
library("tidyverse")
library("janitor")
Run the following code chunk to import the data. In this chunk, I am removing NA’s.
sharks <-
read_csv("data/SharkIncidents_1950_2022_220302.csv",
na=c("NA", "N/A", "Unknown", "NOT COUNTED", "", " ")) %>%
clean_names()
sharks %>%
count(county) %>%
arrange(desc(n)) %>%
slice_head(n=5)
## # A tibble: 5 × 2
## county n
## <chr> <int>
## 1 San Diego 25
## 2 Humboldt 20
## 3 San Mateo 19
## 4 Santa Barbara 19
## 5 Marin 16
sharks %>%
group_by(county) %>%
summarise(n = n()) %>%
ggplot(aes(x=reorder(county, n), y=n)) +
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by County (1950-2022)",
x=NULL,
y="n") +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))+
geom_text(aes(label = n), vjust = -0.2, size = 3, color = "black")
October
sharks %>%
group_by(month) %>%
summarise(total=n()) %>%
arrange(-total)
## # A tibble: 12 × 2
## month total
## <dbl> <int>
## 1 10 36
## 2 9 34
## 3 8 31
## 4 7 27
## 5 5 16
## 6 11 16
## 7 6 14
## 8 12 13
## 9 1 8
## 10 4 7
## 11 3 5
## 12 2 4
sharks %>%
group_by(month) %>%
summarise(total=n()) %>%
ggplot(aes(x=as_factor(month), y=total))+
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by Month",
x="Month",
y="n")+
theme(plot.title = element_text(size = 14, face="bold"))
1100
sharks %>%
mutate(time=str_remove(time, ":")) %>%
filter(!time %in% c("afternoon", "early am", NA)) %>%
mutate(time=as.numeric(time)) %>%
count(time) %>%
arrange(time) %>%
ggplot(aes(x=time, y=n))+
geom_col()+
scale_x_continuous(breaks=seq(0, 2400, by=100))+
theme(axis.text.x = element_text(angle = 60, hjust = 0.85))+
labs(
x = "Time of day",
y = "Number of incidents",
title = "Shark incidents by time of day"
)
San Diego
sharks %>%
select(county, injury) %>%
group_by(county, injury) %>%
summarise(total=n(), .groups='keep') %>%
pivot_wider(names_from = injury, values_from = total)%>%
mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>%
arrange(desc(total))
## # A tibble: 22 × 9
## # Groups: county [22]
## county minor none major `minor*` `none*` fatal `major*` total
## <chr> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 San Diego 8 9 4 1 1 2 NA 23
## 2 Santa Barbara 6 9 2 NA NA 2 NA 19
## 3 Humboldt 2 9 7 1 1 NA NA 18
## 4 San Mateo 4 12 1 1 NA 1 NA 18
## 5 Marin 4 3 9 NA NA NA NA 16
## 6 Monterey 2 3 8 NA NA 2 1 15
## 7 Santa Cruz 3 8 3 NA NA 1 NA 15
## 8 Sonoma 1 6 8 NA NA NA NA 15
## 9 San Luis Obispo 1 7 3 NA NA 3 NA 14
## 10 Los Angeles 6 2 NA NA NA 1 1 9
## # ℹ 12 more rows
sharks %>%
select(county, injury) %>%
group_by(county, injury) %>%
summarise(total=n(), .groups='keep') %>%
pivot_wider(names_from = injury, values_from = total)%>%
mutate(total=sum(minor, major, fatal, none, na.rm=T)) %>%
arrange(desc(total)) %>%
ggplot(aes(x=reorder(county, total), y=total))+
geom_col(fill = "#0099f9", alpha=0.8)+
labs(title="Shark Incidents by County",
x=NULL,
y="n")+
coord_flip()
mode refers to a type of activity. Which
activity is associated with the highest number of incidents?sharks %>%
count(mode) %>%
arrange(desc(n))
## # A tibble: 8 × 2
## mode n
## <chr> <int>
## 1 Surfing / Boarding 85
## 2 Freediving 36
## 3 Kayaking / Canoeing 30
## 4 Swimming 22
## 5 Scuba Diving 20
## 6 Hookah Diving 10
## 7 Paddleboarding 7
## 8 Walking in shallow 1
sharks %>%
ggplot(aes(x=mode, fill=mode))+
geom_bar(alpha=0.8, position="dodge")+
labs(title="Incidents by Activity",
x=NULL,
y="Number of Incidents")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))
sharks %>%
count(species) %>%
arrange(desc(n))
## # A tibble: 12 × 2
## species n
## <chr> <int>
## 1 White 181
## 2 <NA> 14
## 3 Hammerhead 3
## 4 Blue 2
## 5 Killer Whale 2
## 6 Leopard 2
## 7 Salmon 2
## 8 Blue* 1
## 9 Mako 1
## 10 Sevengill 1
## 11 Thresher 1
## 12 blue 1
sharks %>%
filter(species=="White") %>%
ggplot(aes(x=injury))+
geom_bar(fill = "#0099f9", alpha=0.8)+
labs(title="Incidents Involving Great White Sharks",
x="Injury",
y="n")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 14, face="bold"))
Please knit your work as an .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!