Instructions

Answer the following questions and/or complete the exercises in RMarkdown. Please embed all of your code and push the final work to your repository. Your report should be organized, clean, and run free from errors. Remember, you must remove the # for any included code chunks to run.

Load the tidyverse

library("tidyverse")
library("janitor")

Load the superhero data

Let’s have a little fun with this one! We are going to explore data on superheroes. These are data taken from comic books and assembled by devoted fans. The include a good mix of categorical and continuous data. Data taken from: https://www.kaggle.com/claudiodavi/superhero-set

Load the heroes_information.csv and super_hero_powers.csv data. Make sure the columns are cleanly named.

superhero_info <- read_csv("data/heroes_information.csv", na = c("", "-99", "-")) %>% clean_names()
## Rows: 734 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): name, Gender, Eye color, Race, Hair color, Publisher, Skin color, A...
## dbl (2): Height, Weight
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
superhero_powers <- read_csv("data/super_hero_powers.csv", na = c("", "-99", "-")) %>% clean_names()
## Rows: 667 Columns: 168
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): hero_names
## lgl (167): Agility, Accelerated Healing, Lantern Power Ring, Dimensional Awa...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. For the superhero_info data, how many bad, good, and neutral superheros are there? Try using count() or tabyl().
table(superhero_info$alignment)
## 
##     bad    good neutral 
##     207     496      24
superhero_info %>% 
  count(alignment)
## # A tibble: 4 × 2
##   alignment     n
##   <chr>     <int>
## 1 bad         207
## 2 good        496
## 3 neutral      24
## 4 <NA>          7
tabyl(superhero_info, alignment)
##  alignment   n     percent valid_percent
##        bad 207 0.282016349    0.28473177
##       good 496 0.675749319    0.68225585
##    neutral  24 0.032697548    0.03301238
##       <NA>   7 0.009536785            NA
  1. Notice that we have some bad superheros! Who are they? List their names below.
bad <- superhero_info %>% 
  filter(alignment=="bad")
bad$name
##   [1] "Abomination"       "Abraxas"           "Absorbing Man"    
##   [4] "Air-Walker"        "Ajax"              "Alex Mercer"      
##   [7] "Alien"             "Amazo"             "Ammo"             
##  [10] "Angela"            "Annihilus"         "Anti-Monitor"     
##  [13] "Anti-Spawn"        "Apocalypse"        "Arclight"         
##  [16] "Atlas"             "Azazel"            "Bane"             
##  [19] "Beetle"            "Big Barda"         "Big Man"          
##  [22] "Billy Kincaid"     "Bird-Man"          "Bird-Man II"      
##  [25] "Black Abbott"      "Black Adam"        "Black Mamba"      
##  [28] "Black Manta"       "Blackout"          "Blackwing"        
##  [31] "Blizzard"          "Blizzard"          "Blizzard II"      
##  [34] "Blob"              "Bloodaxe"          "Bloodwraith"      
##  [37] "Boba Fett"         "Bomb Queen"        "Brainiac"         
##  [40] "Bullseye"          "Callisto"          "Carnage"          
##  [43] "Chameleon"         "Changeling"        "Cheetah"          
##  [46] "Cheetah II"        "Cheetah III"       "Chromos"          
##  [49] "Clock King"        "Cogliostro"        "Cottonmouth"      
##  [52] "Curse"             "Cy-Gor"            "Cyborg Superman"  
##  [55] "Darkseid"          "Darkside"          "Darth Maul"       
##  [58] "Darth Vader"       "Deadshot"          "Demogoblin"       
##  [61] "Destroyer"         "Diamondback"       "Doctor Doom"      
##  [64] "Doctor Doom II"    "Doctor Octopus"    "Doomsday"         
##  [67] "Doppelganger"      "Dormammu"          "Ego"              
##  [70] "Electro"           "Elle Bishop"       "Evil Deadpool"    
##  [73] "Evilhawk"          "Exodus"            "Fabian Cortez"    
##  [76] "Fallen One II"     "Faora"             "Fixer"            
##  [79] "Frenzy"            "General Zod"       "Giganta"          
##  [82] "Goblin Queen"      "Godzilla"          "Gog"              
##  [85] "Gorilla Grodd"     "Granny Goodness"   "Greedo"           
##  [88] "Green Goblin"      "Green Goblin II"   "Harley Quinn"     
##  [91] "Heat Wave"         "Hela"              "Hobgoblin"        
##  [94] "Hydro-Man"         "Iron Monger"       "Jigsaw"           
##  [97] "Joker"             "Junkpile"          "Kang"             
## [100] "Killer Croc"       "Killer Frost"      "King Shark"       
## [103] "Kingpin"           "Klaw"              "Kraven II"        
## [106] "Kraven the Hunter" "Kylo Ren"          "Lady Bullseye"    
## [109] "Lady Deathstrike"  "Leader"            "Lex Luthor"       
## [112] "Lightning Lord"    "Living Brain"      "Lizard"           
## [115] "Loki"              "Luke Campbell"     "Mach-IV"          
## [118] "Magneto"           "Magus"             "Mandarin"         
## [121] "Match"             "Maxima"            "Mephisto"         
## [124] "Metallo"           "Mister Freeze"     "Mister Knife"     
## [127] "Mister Mxyzptlk"   "Mister Sinister"   "Mister Zsasz"     
## [130] "MODOK"             "Moloch"            "Molten Man"       
## [133] "Moonstone"         "Morlun"            "Moses Magnum"     
## [136] "Mysterio"          "Mystique"          "Nebula"           
## [139] "Omega Red"         "Onslaught"         "Overtkill"        
## [142] "Ozymandias"        "Parademon"         "Penguin"          
## [145] "Plantman"          "Plastique"         "Poison Ivy"       
## [148] "Predator"          "Professor Zoom"    "Proto-Goblin"     
## [151] "Purple Man"        "Pyro"              "Ra's Al Ghul"     
## [154] "Razor-Fist II"     "Red Mist"          "Red Skull"        
## [157] "Redeemer II"       "Redeemer III"      "Rhino"            
## [160] "Rick Flag"         "Riddler"           "Sabretooth"       
## [163] "Sauron"            "Scarecrow"         "Scarlet Witch"    
## [166] "Scorpia"           "Scorpion"          "Sebastian Shaw"   
## [169] "Shocker"           "Siren"             "Siren II"         
## [172] "Siryn"             "Snake-Eyes"        "Solomon Grundy"   
## [175] "Spider-Carnage"    "Spider-Woman IV"   "Steppenwolf"      
## [178] "Stormtrooper"      "Superboy-Prime"    "Swamp Thing"      
## [181] "Swarm"             "Sylar"             "T-1000"           
## [184] "T-800"             "T-850"             "T-X"              
## [187] "Taskmaster"        "Thanos"            "Tiger Shark"      
## [190] "Tinkerer"          "Trigon"            "Two-Face"         
## [193] "Ultron"            "Utgard-Loki"       "Vanisher"         
## [196] "Vegeta"            "Venom"             "Venom II"         
## [199] "Venom III"         "Violator"          "Vulture"          
## [202] "Walrus"            "Warp"              "Weapon XI"        
## [205] "White Canary"      "Yellow Claw"       "Zoom"
  1. How many distinct “races” are represented in superhero_info?
superhero_info %>%
  select(race) %>% 
  n_distinct()
## [1] 62
superhero_info %>% 
  group_by(race) %>% 
  summarize(n=n()) %>% 
  arrange(-n)
## # A tibble: 62 × 2
##    race                  n
##    <chr>             <int>
##  1 <NA>                304
##  2 Human               208
##  3 Mutant               63
##  4 God / Eternal        14
##  5 Cyborg               11
##  6 Human / Radiation    11
##  7 Android               9
##  8 Symbiote              9
##  9 Alien                 7
## 10 Kryptonian            7
## # ℹ 52 more rows

Good and Bad

  1. Let’s make two different data frames, one focused on the “good guys” and another focused on the “bad guys”.
good_guys <- 
  superhero_info %>% 
  filter(alignment=="good")
bad_guys <- 
  superhero_info %>% 
  filter(alignment=="bad")
  1. Who are the good Vampires?
good_guys %>% filter(race=="Vampire")
## # A tibble: 2 × 10
##   name  gender eye_color race   hair_color height publisher skin_color alignment
##   <chr> <chr>  <chr>     <chr>  <chr>       <dbl> <chr>     <chr>      <chr>    
## 1 Angel Male   <NA>      Vampi… <NA>           NA Dark Hor… <NA>       good     
## 2 Blade Male   brown     Vampi… Black         188 Marvel C… <NA>       good     
## # ℹ 1 more variable: weight <dbl>
  1. Who has the height advantage- bad guys or good guys? Convert their height to meters and sort from tallest to shortest.
bad_guys %>%
  select(name, height) %>% 
  mutate(height_meters=height*0.0254) %>% 
  arrange(-height_meters)
## # A tibble: 207 × 3
##    name           height height_meters
##    <chr>           <dbl>         <dbl>
##  1 MODOK             366          9.30
##  2 Onslaught         305          7.75
##  3 Sauron            279          7.09
##  4 Solomon Grundy    279          7.09
##  5 Darkseid          267          6.78
##  6 Amazo             257          6.53
##  7 Alien             244          6.20
##  8 Doomsday          244          6.20
##  9 Killer Croc       244          6.20
## 10 Venom III         229          5.82
## # ℹ 197 more rows
good_guys %>% 
  select(name, height) %>% 
  mutate(height_meters=height*0.0254) %>% 
  arrange(-height_meters)
## # A tibble: 496 × 3
##    name          height height_meters
##    <chr>          <dbl>         <dbl>
##  1 Fin Fang Foom   975          24.8 
##  2 Groot           701          17.8 
##  3 Wolfsbane       366           9.30
##  4 Sasquatch       305           7.75
##  5 Ymir            305.          7.74
##  6 Rey             297           7.54
##  7 Hellboy         259           6.58
##  8 Hulk            244           6.20
##  9 Kilowog         234           5.94
## 10 Cloak           226           5.74
## # ℹ 486 more rows

superhero_powers

Have a quick look at the superhero_powers data frame.

  1. How many superheros have a combination of agility, stealth, super_strength, and stamina?
superhero_powers %>%
  filter(agility & stealth & super_strength & stamina) %>%
  select(hero_names, agility, stealth, super_strength, stamina)
## # A tibble: 40 × 5
##    hero_names  agility stealth super_strength stamina
##    <chr>       <lgl>   <lgl>   <lgl>          <lgl>  
##  1 Alex Mercer TRUE    TRUE    TRUE           TRUE   
##  2 Angel       TRUE    TRUE    TRUE           TRUE   
##  3 Ant-Man II  TRUE    TRUE    TRUE           TRUE   
##  4 Aquaman     TRUE    TRUE    TRUE           TRUE   
##  5 Batman      TRUE    TRUE    TRUE           TRUE   
##  6 Black Flash TRUE    TRUE    TRUE           TRUE   
##  7 Black Manta TRUE    TRUE    TRUE           TRUE   
##  8 Brundlefly  TRUE    TRUE    TRUE           TRUE   
##  9 Buffy       TRUE    TRUE    TRUE           TRUE   
## 10 Cable       TRUE    TRUE    TRUE           TRUE   
## # ℹ 30 more rows
superhero_powers %>% 
  select(hero_names, agility, stealth, super_strength, stamina) %>% 
  filter(agility==TRUE & stealth==TRUE & super_strength==TRUE & stamina==TRUE)
## # A tibble: 40 × 5
##    hero_names  agility stealth super_strength stamina
##    <chr>       <lgl>   <lgl>   <lgl>          <lgl>  
##  1 Alex Mercer TRUE    TRUE    TRUE           TRUE   
##  2 Angel       TRUE    TRUE    TRUE           TRUE   
##  3 Ant-Man II  TRUE    TRUE    TRUE           TRUE   
##  4 Aquaman     TRUE    TRUE    TRUE           TRUE   
##  5 Batman      TRUE    TRUE    TRUE           TRUE   
##  6 Black Flash TRUE    TRUE    TRUE           TRUE   
##  7 Black Manta TRUE    TRUE    TRUE           TRUE   
##  8 Brundlefly  TRUE    TRUE    TRUE           TRUE   
##  9 Buffy       TRUE    TRUE    TRUE           TRUE   
## 10 Cable       TRUE    TRUE    TRUE           TRUE   
## # ℹ 30 more rows
  1. Who is the most powerful superhero? Have a look at the code chunk below. Use the internet to annotate each line of code so you know how it works. It’s OK to use AI to help you with this task.
superhero_powers %>%
  mutate(across(-1, ~ ifelse(. == TRUE, 1, 0))) %>% 
  mutate(total_powers = rowSums(across(-1))) %>% 
  select(hero_names, total_powers) %>% 
  arrange(-total_powers)
## # A tibble: 667 × 2
##    hero_names        total_powers
##    <chr>                    <dbl>
##  1 Spectre                     49
##  2 Amazo                       44
##  3 Living Tribunal             35
##  4 Martian Manhunter           35
##  5 Man of Miracles             34
##  6 Captain Marvel              33
##  7 T-X                         33
##  8 Galactus                    32
##  9 T-1000                      32
## 10 Mister Mxyzptlk             31
## # ℹ 657 more rows
superhero_powers %>% 
  # Start with the `superhero_powers` data frame and pipe it into the next step.
  
  mutate(across(-1, ~ ifelse(. == TRUE, 1, 0))) %>%
  # Transform all columns except the first one (`-1`) using `across`. 
  # For each value in those columns, replace `TRUE` with 1 and all other values (e.g., `FALSE`) with 0.

  mutate(total_powers = rowSums(across(-1))) %>%
  # Create a new column, `total_powers`, that sums up the values row-wise across all columns except the first one (`-1`).
  
  select(hero_names, total_powers) %>%
  # Keep only the `hero_names` column (assumed to be the first column) and the newly created `total_powers` column.
  
  arrange(-total_powers)
## # A tibble: 667 × 2
##    hero_names        total_powers
##    <chr>                    <dbl>
##  1 Spectre                     49
##  2 Amazo                       44
##  3 Living Tribunal             35
##  4 Martian Manhunter           35
##  5 Man of Miracles             34
##  6 Captain Marvel              33
##  7 T-X                         33
##  8 Galactus                    32
##  9 T-1000                      32
## 10 Mister Mxyzptlk             31
## # ℹ 657 more rows
  # Arrange (sort) the data frame in descending order of `total_powers` (from highest to lowest).

Your Favorite

  1. Pick your favorite superhero and let’s see their powers!
superhero_powers %>%
  filter(hero_names == "Darth Vader") %>%
  select_if(all)  # Selects all columns where all values are TRUE
## Warning in .p(column, ...): coercing argument of type 'character' to logical
## # A tibble: 1 × 26
##   agility accelerated_healing durability stealth danger_sense marksmanship
##   <lgl>   <lgl>               <lgl>      <lgl>   <lgl>        <lgl>       
## 1 TRUE    TRUE                TRUE       TRUE    TRUE         TRUE        
## # ℹ 20 more variables: weapons_master <lgl>, intelligence <lgl>,
## #   telepathy <lgl>, energy_blasts <lgl>, super_speed <lgl>,
## #   electrokinesis <lgl>, enhanced_senses <lgl>, telekinesis <lgl>, jump <lgl>,
## #   astral_projection <lgl>, reflexes <lgl>, force_fields <lgl>,
## #   psionic_powers <lgl>, precognition <lgl>, enhanced_hearing <lgl>,
## #   hypnokinesis <lgl>, light_control <lgl>, illusions <lgl>, cloaking <lgl>,
## #   the_force <lgl>
  1. Can you find your hero in the superhero_info data? Show their info!
superhero_info %>% 
  filter(name=="Darth Vader")
## # A tibble: 1 × 10
##   name   gender eye_color race  hair_color height publisher skin_color alignment
##   <chr>  <chr>  <chr>     <chr> <chr>       <dbl> <chr>     <chr>      <chr>    
## 1 Darth… Male   yellow    Cybo… No Hair       198 George L… <NA>       bad      
## # ℹ 1 more variable: weight <dbl>

Knit and Upload

Please knit your work as a .pdf or .html file and upload to Canvas. Homework is due before the start of the next lab. No late work is accepted. Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!