Welcome back to another semester of our UseR.
Weekly challenge
The data
Work with the ecology data set from datacarpentry. An explanation of the dataset can be found here
library(tidyverse)
mydata <- read_csv("https://ndownloader.figshare.com/files/2292169")
glimpse(mydata)
## Observations: 34,786
## Variables: 13
## $ record_id <int> 1, 72, 224, 266, 349, 363, 435, 506, 588, 661,...
## $ month <int> 7, 8, 9, 10, 11, 11, 12, 1, 2, 3, 4, 5, 6, 8, ...
## $ day <int> 16, 19, 13, 16, 12, 12, 10, 8, 18, 11, 8, 6, 9...
## $ year <int> 1977, 1977, 1977, 1977, 1977, 1977, 1977, 1978...
## $ plot_id <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
## $ species_id <chr> "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL"...
## $ sex <chr> "M", "M", NA, NA, NA, NA, NA, NA, "M", NA, NA,...
## $ hindfoot_length <int> 32, 31, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32...
## $ weight <int> NA, NA, NA, NA, NA, NA, NA, NA, 218, NA, NA, 2...
## $ genus <chr> "Neotoma", "Neotoma", "Neotoma", "Neotoma", "N...
## $ species <chr> "albigula", "albigula", "albigula", "albigula"...
## $ taxa <chr> "Rodent", "Rodent", "Rodent", "Rodent", "Roden...
## $ plot_type <chr> "Control", "Control", "Control", "Control", "C...
In-class challenges
1. Find the average hindfoot_length
mean(mydata$hindfoot_length)
## [1] NA
# Notice the answer is NA. This is because NAs are in the data and thus need to be removed. This can be accomplished in two ways
# 1. Remove the NAs from the data and create a new object and then take the mean
hindfoot_length.rev <- mydata$hindfoot_length[!is.na(mydata$hindfoot_length)]
avg_foot <- mean(hindfoot_length.rev)
avg_foot
## [1] 29.28793
# 2. Use na.rm option within the mean function
?mean
avg_foot <-mean(mydata$hindfoot_length, na.rm = TRUE)
avg_foot
## [1] 29.28793
2. How many are above and below average
#step 1 - find average
avg <- mean(mydata$hindfoot_length, na.rm=T)
#step 2- index to get only value in number below
lessthan <- mydata$hindfoot_length < avg
head(lessthan, 25) # display the first 25 elements
## [1] FALSE FALSE NA NA NA NA NA NA NA NA NA
## [12] FALSE NA FALSE FALSE NA NA FALSE FALSE FALSE FALSE FALSE
## [23] FALSE NA FALSE
greaterthan <- mydata$hindfoot_length > avg
#step 3 - count the number of rows
sum(lessthan, na.rm=T)
## [1] 15371
sum(greaterthan, na.rm=T)
## [1] 16067
Take-home challenges
1. What are the names of the plot types (treatments) in the experiment?
2. How many species caught?
3. How many species of birds? Rodents?
**4. Average weight of Male Rodents? **
5. Average weight of Female Rodents?