Answer the following questions and/or complete the exercises in
RMarkdown. Please embed all of your code and push the final work to your
repository. Your report should be organized, clean, and run free from
errors. Remember, you must remove the # for any included
code chunks to run.
library("tidyverse")
library("janitor")
library("naniar")
options(scipen = 999)
For this assignment we are going to work with a data set from the United Nations
Food and Agriculture Organization on world fisheries. These data
were downloaded and cleaned using the fisheries_clean.Rmd
script.
Load the data fisheries_clean.csv as a new object titled
fisheries_clean.
fisheries_clean <- read_csv("data/fisheries_clean.csv")
glimpse(fisheries_clean)
## Rows: 1,055,015
## Columns: 9
## $ period <dbl> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, …
## $ continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia"…
## $ geo_region <chr> "Southern Asia", "Southern Asia", "Southern Asia", "So…
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanis…
## $ scientific_name <chr> "Osteichthyes", "Osteichthyes", "Osteichthyes", "Ostei…
## $ common_name <chr> "Freshwater fishes NEI", "Freshwater fishes NEI", "Fre…
## $ taxonomic_code <chr> "1990XXXXXXXX106", "1990XXXXXXXX106", "1990XXXXXXXX106…
## $ catch <dbl> 100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 200,…
## $ status <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
period,
continent, geo_region, country,
scientific_name, common_name,
taxonomic_code, and status.fisheries_clean <- fisheries_clean%>%
mutate(across(c(period,
continent,
geo_region,
country,
scientific_name,
common_name,
taxonomic_code,
status),
as.factor))
fisheries_clean %>%
miss_var_summary()
## # A tibble: 9 × 3
## variable n_miss pct_miss
## <chr> <int> <num>
## 1 continent 23811 2.26
## 2 geo_region 23811 2.26
## 3 common_name 2846 0.270
## 4 period 0 0
## 5 country 0 0
## 6 scientific_name 0 0
## 7 taxonomic_code 0 0
## 8 catch 0 0
## 9 status 0 0
fisheries_clean %>%
summarize(n_countries = n_distinct(country))
## # A tibble: 1 × 1
## n_countries
## <int>
## 1 249
common_name and
taxonomic_code both refer to species. How many unique
species are represented in the data based on each of these variables?
Are the numbers the same or different?fisheries_clean%>%
summarize(common_name = n_distinct(common_name),
taxonomic_codes = n_distinct(taxonomic_code))
## # A tibble: 1 × 2
## common_name taxonomic_codes
## <int> <int>
## 1 3390 3722
fisheries_clean%>%
filter(period=="2023") %>%
group_by(country) %>%
summarize(catch_total=sum(catch, na.rm=T)) %>%
arrange(desc(catch_total)) %>%
slice_head(n=5)
## # A tibble: 5 × 2
## country catch_total
## <fct> <dbl>
## 1 China 13424705.
## 2 Indonesia 7820833.
## 3 India 6177985.
## 4 Russian Federation 5398032
## 5 United States of America 4623694
common_name is sufficient to identify
species. What does NEI stand for in some of the common
names? How might this be concerning from a fisheries management
perspective?fisheries_clean %>%
filter(period=="2023") %>%
group_by(common_name) %>%
summarize(catch_total=sum(catch, na.rm=T)) %>%
arrange(desc(catch_total)) %>%
slice_head(n=10)
## # A tibble: 10 × 2
## common_name catch_total
## <fct> <dbl>
## 1 Marine fishes NEI 8553907.
## 2 Freshwater fishes NEI 5880104.
## 3 Alaska pollock(=Walleye poll.) 3543411.
## 4 Skipjack tuna 2954736.
## 5 Anchoveta(=Peruvian anchovy) 2415709.
## 6 Blue whiting(=Poutassou) 1739484.
## 7 Pacific sardine 1678237.
## 8 Yellowfin tuna 1601369.
## 9 Atlantic herring 1432807.
## 10 Scads NEI 1344190.
fisheries_clean %>%
filter(common_name == "Alaska pollock(=Walleye poll.)" & period=="2023") %>%
group_by(country) %>%
summarize(catch_total=sum(catch, na.rm=T)) %>%
arrange(desc(catch_total))
## # A tibble: 6 × 2
## country catch_total
## <fct> <dbl>
## 1 Russian Federation 1893924
## 2 United States of America 1433538
## 3 Japan 122900
## 4 Democratic People's Republic of Korea 58730
## 5 Republic of Korea 28432.
## 6 Canada 5887.
fisheries_clean %>%
filter((period %in% 2013:2023), common_name == "Alaska pollock(=Walleye poll.)") %>%
group_by(period) %>%
summarize(catch_total = sum(catch, na.rm = T)) %>%
ggplot(aes(x = period, y = catch_total)) +
geom_col()
Please knit your work as an .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!