For a list of the questions for the activity, go here: https://docs.google.com/document/d/1R31H1g3Ll_UdizIz3MW_9IBnxZXYFicxXytioUtKCzQ/edit?usp=sharing
When you are done, please add your plot(s) to the Jamboard and/or send us an email with your plots!! https://jamboard.google.com/d/1xp1yFPHMp501QYKVeQ-j1jlmBMytRcf3qO0PnKaz704/edit?usp=sharing
Open an R script and load the covid_attitudes_clean.csv data.
Make sure to load the tidyverse library at the start of your script:
library(tidyverse), and set options(stringsAsFactors = FALSE).
## Load packages and set options ##
library(tidyverse)
options(stringsAsFactors = FALSE)
## Read in data ##
covid_attitudes <- read.csv('covid_attitudes.csv')
First, we need to get our data ready to plot:
1. “definitely” is spelled wrong for the response option “defintely
not”. So, we need to correct that spelling!
We need to change “probably_not” to say “probably”
We also need to make Q35 into a factor and order the levels from definitely to probably not (if we don’t set the levels, then they won’t be in the correct order, R will default them to alphabetical order)
# Set up data frame for plot
df_q35.plot <- covid_attitudes %>%
#remove NA's
drop_na() %>% #Note: this removes rows that have NA is ANY column. If I wanted to remove rows that have NA only in the Q35 column, for instance, this would be drop_na(Q35.take_vaccine.)
# Change probably_not and spell definitely not correctly
mutate(
Q35.take_vaccine. = case_when(
Q35.take_vaccine. == "probably_not" ~ "probably not",
Q35.take_vaccine. == "defintely not" ~ "definitely not" ,
TRUE ~ Q35.take_vaccine.),
# Make Q35 into a factor
Q35.take_vaccine. = factor(
Q35.take_vaccine.,
levels = c("definitely not",
"probably not",
"unsure",
"probably",
"definitely")
)
)
# Setting up our plot
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2") +
# Get rid of the legend on the plot (set it equal to "none" or "NULL")
theme(legend.position = "none")
ggsave("q35plot.jpg", width = 5, height = 3) #This saves my plot to my files!
And voila! We have recreated the plot!
We want to note that the final product of beautiful, clean code that you see here was not written perfectly the first time or even in this order! We would bet that NO ONE would write this code perfectly on the first go.
The key is to iterate! Write some code, run it, see if it works, and then add something else, run it, and then keep going!
Another tip is to break down the big problem into little ones to solve one step at a time.
Here is an example of our thought process as we worked on recreating the plot:
ggplot(covid_attitudes, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
geom_boxplot()
## Warning: Removed 157 rows containing non-finite values (stat_boxplot).
Full disclosure, I built this up incrementally as well! It took me a little while to realize that “defintely not” was spelled wrong, and I kept getting weird NAs in my data from setting the levels to a name that didn’t match the actual data. Eventually, I figured that out!
df_q35.plot <- covid_attitudes %>%
drop_na() %>%
mutate(
Q35.take_vaccine. = case_when(
Q35.take_vaccine. == "probably_not" ~ "probably not",
Q35.take_vaccine. == "defintely not" ~ "definitely not" ,
TRUE ~ Q35.take_vaccine.
),
Q35.take_vaccine. = factor(
Q35.take_vaccine.,
levels = c(c("definitely not",
"probably not",
"unsure",
"probably",
"definitely")
)
)
)
If you are getting NAs after you make something a factor, it is almost always because something went wrong in that process. And often the error is spelling something wrong when setting the levels of the factor.
This is a good lesson in making sure your code is doing what you think it is doing! Also, after you filter something, check to be sure it is actually filtered out!
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
geom_boxplot()
Excellent, now we are getting somewhere!
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority")
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2")
Why wasn’t it working? I thought maybe I hadn’t typed in the argument correctly, so I looked up the help file for scale_fill_brewer. Also, sometimes it wants scale_color_brewer, not the fill one, so I tried that, too.
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2")
It worked yay!
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2") +
# Get rid of the legend on the plot (set it equal to "none" or "NULL")
theme(legend.position = "none")
Here are some ways you could explore the example questions with graphs.
# Creating a covid_clean data frame that drops the rows that have NA values
covid_clean <- covid_attitudes %>% drop_na()
q84.plot <- covid_attitudes %>%
drop_na() %>%
mutate(Q84.community = case_when(Q84.community == "ruralArea" ~ "rural area",
TRUE ~ Q84.community),
Q84.community.factor = factor(Q84.community,
levels = c("rural area",
"suburb",
"small city/town",
"large city")))
ggplot(q84.plot, aes(x = Q84.community.factor,
y = Q16.Belief_scientists_understand_covid,
fill = Q84.community.factor)) +
geom_violin() +
geom_boxplot(color = "black", width = .1) +
theme_classic() +
xlab("community") +
ylab("Belief that scientists understand covid") +
theme(legend.position = "none")
Again, this was made in an iterative fashion! For example, at the end we added boxplots, and also decided to order the levels of community in order to go from smallest to largest.
ggplot(covid_clean, aes(x = Q10.rank_attention_to_news,
y = Q14.confidence_us_government)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~Q40.age) +
theme_bw() +
xlab("Attention to news (1-10)") +
ylab("Confidence in U.S. Gov't (1-10)")
## `geom_smooth()` using formula 'y ~ x'
df_plot_takeVax <- covid_clean %>%
# Fix the known issues with the Q35 column
mutate(Q35.take_vaccine. =
case_when(Q35.take_vaccine. == "defintely not" ~ "definitely not",
Q35.take_vaccine. == "probably_not" ~ "probably not",
TRUE ~ Q35.take_vaccine.),
# Make Q35 into a factor and order the levels
Q35.take_vaccine. = factor(Q35.take_vaccine.,
levels = c("definitely",
"probably",
"unsure",
"probably not",
"definitely not")),
# Make Q84 intoa factor and order the levels
Q84.community = factor(Q84.community,
levels = c("rural area",
"suburb",
"small city/town",
"large city"))) %>%
# Remove age groups that have less than 10 participants!
group_by(Q40.age) %>%
filter(n() > 10) %>%
# Remove communities that have less than 10 participants
group_by(Q84.community) %>%
filter(n() > 50) %>%
ungroup()
First, I tried it this way, with just community to see what it looked like.
ggplot(df_plot_takeVax, aes(x = Q35.take_vaccine., fill = Q84.community)) +
geom_bar(position = "dodge")
Then, I decided I wanted to facet by community, and have side by side bars for age.
ggplot(df_plot_takeVax, aes(x = Q35.take_vaccine., fill = Q40.age)) +
geom_bar(position = "dodge") +
facet_wrap(~Q84.community) +
theme_classic() +
xlab("Willingness to take vaccine")
This is not the most beautiful plot (i.e., I wouldn’t put it in publication), but for a cursory exploration of your data, it will do! I thought this would be the easiest for comparison. However, depending on what comparisons you are interested in, you could have plotted community side-by-side and faceted by age range. Or you could have picked only a few key age ranges of interest to look at.
df_plot_ed <- covid_clean %>%
# Remove education levels that have less than 10 people
group_by(Q74.education) %>%
filter(n() > 10) %>%
ungroup() %>%
# Make education into a factor, and order the levels by ed level
mutate(Q74.education = factor(Q74.education, levels = c("highschool graduate", "some college", "2 year degree", " 4 year degree", "professional degree", "doctorate")))
ggplot(df_plot_ed, aes(x = Q74.education,
y = Q16.Belief_scientists_understand_covid,
fill = Q74.education)) +
geom_violin() +
theme_classic() +
xlab("Education") +
ylab("Beleif that scientists understand covid") +
theme(legend.position = "none")