Instructions

Answer the following questions and/or complete the exercises in RMarkdown. Please embed all of your code and push the final work to your repository. Your report should be organized, clean, and run free from errors. Remember, you must remove the # for any included code chunks to run.

Load the libraries

library("tidyverse")
library("janitor")
library("naniar")
options(scipen = 999)

About the Data

For this assignment we are going to work with a data set from the United Nations Food and Agriculture Organization on world fisheries. These data were downloaded and cleaned using the fisheries_clean.Rmd script.

Load the data fisheries_clean.csv as a new object titled fisheries_clean.

fisheries_clean <- read_csv("data/fisheries_clean.csv")
  1. Explore the data. What are the names of the variables, what are the dimensions, are there any NA’s, what are the classes of the variables, etc.? You may use the functions that you prefer.
glimpse(fisheries_clean)
## Rows: 1,055,015
## Columns: 9
## $ period          <dbl> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, …
## $ continent       <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia"…
## $ geo_region      <chr> "Southern Asia", "Southern Asia", "Southern Asia", "So…
## $ country         <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanis…
## $ scientific_name <chr> "Osteichthyes", "Osteichthyes", "Osteichthyes", "Ostei…
## $ common_name     <chr> "Freshwater fishes NEI", "Freshwater fishes NEI", "Fre…
## $ taxonomic_code  <chr> "1990XXXXXXXX106", "1990XXXXXXXX106", "1990XXXXXXXX106…
## $ catch           <dbl> 100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 200,…
## $ status          <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
  1. Convert the following variables to factors: period, continent, geo_region, country, scientific_name, common_name, taxonomic_code, and status.
fisheries_clean <- fisheries_clean%>% 
  mutate(across(c(period, 
                  continent, 
                  geo_region, 
                  country, 
                  scientific_name, 
                  common_name, 
                  taxonomic_code, 
                  status), 
                as.factor))
  1. Are there any missing values in the data? If so, which variables contain missing values and how many are missing for each variable?
fisheries_clean %>%
  miss_var_summary()
## # A tibble: 9 × 3
##   variable        n_miss pct_miss
##   <chr>            <int>    <num>
## 1 continent        23811    2.26 
## 2 geo_region       23811    2.26 
## 3 common_name       2846    0.270
## 4 period               0    0    
## 5 country              0    0    
## 6 scientific_name      0    0    
## 7 taxonomic_code       0    0    
## 8 catch                0    0    
## 9 status               0    0
  1. How many countries are represented in the data?
fisheries_clean %>% 
  summarize(n_countries = n_distinct(country))
## # A tibble: 1 × 1
##   n_countries
##         <int>
## 1         249
  1. The variables common_name and taxonomic_code both refer to species. How many unique species are represented in the data based on each of these variables? Are the numbers the same or different?
fisheries_clean%>% 
  summarize(common_name = n_distinct(common_name),
            taxonomic_codes = n_distinct(taxonomic_code))
## # A tibble: 1 × 2
##   common_name taxonomic_codes
##         <int>           <int>
## 1        3390            3722
  1. In 2023, what were the top five countries that had the highest overall catch?
fisheries_clean%>% 
  filter(period=="2023") %>%
  group_by(country) %>% 
  summarize(catch_total=sum(catch, na.rm=T)) %>% 
  arrange(desc(catch_total)) %>% 
  slice_head(n=5)
## # A tibble: 5 × 2
##   country                  catch_total
##   <fct>                          <dbl>
## 1 China                      13424705.
## 2 Indonesia                   7820833.
## 3 India                       6177985.
## 4 Russian Federation          5398032 
## 5 United States of America    4623694
  1. In 2023, what were the top 10 most caught species? To keep things simple, assume common_name is sufficient to identify species. What does NEI stand for in some of the common names? How might this be concerning from a fisheries management perspective?
fisheries_clean %>% 
  filter(period=="2023") %>%
  group_by(common_name) %>% 
  summarize(catch_total=sum(catch, na.rm=T)) %>% 
  arrange(desc(catch_total)) %>% 
  slice_head(n=10)
## # A tibble: 10 × 2
##    common_name                    catch_total
##    <fct>                                <dbl>
##  1 Marine fishes NEI                 8553907.
##  2 Freshwater fishes NEI             5880104.
##  3 Alaska pollock(=Walleye poll.)    3543411.
##  4 Skipjack tuna                     2954736.
##  5 Anchoveta(=Peruvian anchovy)      2415709.
##  6 Blue whiting(=Poutassou)          1739484.
##  7 Pacific sardine                   1678237.
##  8 Yellowfin tuna                    1601369.
##  9 Atlantic herring                  1432807.
## 10 Scads NEI                         1344190.
  1. For the species that was caught the most above (not NEI), which country had the highest catch in 2023?
fisheries_clean %>% 
  filter(common_name == "Alaska pollock(=Walleye poll.)" & period=="2023") %>%
  group_by(country) %>% 
  summarize(catch_total=sum(catch, na.rm=T)) %>% 
  arrange(desc(catch_total))
## # A tibble: 6 × 2
##   country                               catch_total
##   <fct>                                       <dbl>
## 1 Russian Federation                       1893924 
## 2 United States of America                 1433538 
## 3 Japan                                     122900 
## 4 Democratic People's Republic of Korea      58730 
## 5 Republic of Korea                          28432.
## 6 Canada                                      5887.
  1. How has fishing of this species changed over the last decade (2013-2023)? Create a plot showing total catch by year for this species.
fisheries_clean %>% 
  filter((period %in% 2013:2023), common_name == "Alaska pollock(=Walleye poll.)") %>% 
  group_by(period) %>%
  summarize(catch_total = sum(catch, na.rm = T)) %>% 
  ggplot(aes(x = period, y = catch_total)) +
  geom_col()

  1. Perform one exploratory analysis of your choice. Make sure to clearly state the question you are asking before writing any code.

Knit and Upload

Please knit your work as an .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!