Open an R script and load the covid_attitudes_clean.csv data.
Make sure to load the tidyverse library at the start of your script: library(tidyverse), and set options(stringsAsFactors = FALSE).
## Load packages and set options ##
library(tidyverse)
options(stringsAsFactors = FALSE)
## Read in data ##
covid_clean <- read.csv("covid_attitudes_clean.csv")
First, we need to get our data ready to plot:
1. We need to remove participants who answered “definitely not” to Q35. (Notice that the data say “defintely not”, which is spelled wrong! So we need to make sure we spell it that way when we filter the data!)
We need to change “probably_not” to say “probably”
We also need to make Q35 into a factor and order the levels from definitely to probably not (if we don’t set the levels, then they won’t be in the correct order, R will default them to alphabetical order)
# Set up data frame for plot
df_q35.plot <- covid_clean %>%
# Remove "definitely not" answers (defintely spelled wrong!)
filter(Q35.take_vaccine. != "defintely not") %>%
# Change probably_not
mutate(
Q35.take_vaccine. = case_when(
Q35.take_vaccine. == "probably_not" ~ "probably not",
TRUE ~ Q35.take_vaccine.
),
# Make Q35 into a factor
Q35.take_vaccine. = factor(
Q35.take_vaccine.,
levels = c("definitely",
"probably",
"unsure",
"probably not")
)
)
Next, set up our ggplot with the data frame and aesthetics. In order to color the boxplots, we need to add a fill argument. Since we want them to be filled just by factor level of Q35, we put that as the fill argument.
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2") +
# Get rid of the legend on the plot (set it equal to "none" or "NULL")
theme(legend.position = "none")
And voila! We have recreated the plot!
We want to note that the final product of beautiful, clean code that you see here was not written perfectly the first time or even in this order! We would bet that NO ONE would write this code perfectly on the first go.
The key is to iterate! Write some code, run it, see if it works, and then add something else, run it, and then keep going!
Another tip is to break down the big problem into little ones to solve one step at a time.
Here is an example of Elena’s thought process as she worked on recreating Willa’s plot:
ggplot(covid_clean, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
geom_boxplot()
df_q35.plot <- covid_clean %>%
filter(Q35.take_vaccine. != "defintely not") %>%
drop_na(Q35.take_vaccine.) %>%
mutate(
Q35.take_vaccine. = case_when(
Q35.take_vaccine. == "probably_not" ~ "probably not",
TRUE ~ Q35.take_vaccine.
),
Q35.take_vaccine. = factor(
Q35.take_vaccine.,
levels = c("definitely",
"probably",
"unsure",
"probably not")
)
)
*** See below for more on what happens if you spell “definitely” correctly and it doesn’t match what is in the data.
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
geom_boxplot()
Excellent, now we are getting somewhere!
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority")
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2")
Urgggg. I had to think, hmmm why wasn’t it working? I thought maybe I hadn’t typed in the argument correctly, so I looked up the help file for scale_fill_brewer. Also, sometimes it wants scale_color_brewer, not the fill one, so I tried that, too.
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2")
It worked yay!
ggplot(df_q35.plot, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority,
fill = Q35.take_vaccine.)) +
# Add the boxplots
geom_boxplot() +
# Change the theme to classic for a simplistic and clean look
theme_classic() +
# Change the x-axis label
xlab("willingness to take a vaccine") +
# Change the y-axis label
ylab("confidence in authority") +
# Change the fill of the boxplots
scale_fill_brewer(palette = "Dark2") +
# Get rid of the legend on the plot (set it equal to "none" or "NULL")
theme(legend.position = "none")
*** Following up from step 2: What happens if you spell “definitely” correctly and it doesn’t match what is in the data
df_q35.plot_badDef <- covid_clean %>%
# I spelled definitely correct here, but it doesn't match what's in the data,
# so nothing gets filtered out (this is why it is important to check your code
# line by line and make sure when you filter something out, it is really
# filtering!)
filter(Q35.take_vaccine. != "definitely not") %>%
drop_na(Q35.take_vaccine.) %>%
mutate(Q35.take_vaccine. =
case_when(Q35.take_vaccine. == "probably_not" ~ "probably not",
TRUE ~ Q35.take_vaccine.),
# I don't have a factor level for "defintely not" which is still in the data, so it just becomes NA
Q35.take_vaccine. = factor(Q35.take_vaccine.,
levels = c("definitely",
"probably",
"unsure",
"probably not")))
# If we try to plot it, look what happens now, there are NAs!
ggplot(df_q35.plot_badDef, aes(x = Q35.take_vaccine.,
y = Q101.confidence_in_authority)) +
# Add the boxplots
geom_boxplot()
If you are getting NAs after you make something a factor, it is almost always because something went wrong in that process. And often the error is spelling something wrong when setting the levels of the factor.
This is a good lesson in making sure your code is doing what you think it is doing! After you filter something, check to be sure it is actually filtered out!
Here are some ways you could explore the example questions with graphs.
ggplot(covid_clean, aes(x = factor(Q84.community,
levels = c("rural area",
"suburb",
"small city/town",
"large city")),
y = Q16.Belief_scientists_understand_covid,
fill = Q84.community)) +
geom_violin() +
geom_boxplot(color = "black", width = .1) +
theme_classic() +
xlab("community") +
ylab("Belief that scientists understand covid") +
theme(legend.position = "none")
Again, this was made in an iterative fashion! For example, at the end we added boxplots, and also decided to order the levels of community in order to go from smallest to largest.
ggplot(covid_clean, aes(x = Q10.rank_attention_to_news,
y = Q14.confidence_us_government)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~Q40.age) +
theme_bw() +
xlab("Attention to news (1-10)") +
ylab("Confidence in U.S. Gov't (1-10)")
## `geom_smooth()` using formula 'y ~ x'
df_plot_takeVax <- covid_clean %>%
# Fix the known issues with the Q35 column
mutate(Q35.take_vaccine. =
case_when(Q35.take_vaccine. == "defintely not" ~ "definitely not",
Q35.take_vaccine. == "probably_not" ~ "probably not",
TRUE ~ Q35.take_vaccine.),
# Make Q35 into a factor and order the levels
Q35.take_vaccine. = factor(Q35.take_vaccine.,
levels = c("definitely",
"probably",
"unsure",
"probably not",
"definitely not")),
# Make Q84 intoa factor and order the levels
Q84.community = factor(Q84.community,
levels = c("rural area",
"suburb",
"small city/town",
"large city"))) %>%
# Remove age groups that have less than 10 participants!
group_by(Q40.age) %>%
filter(n() > 10) %>%
# Remove communities that have less than 10 participants
group_by(Q84.community) %>%
filter(n() > 50) %>%
ungroup()
First, I tried it this way, with just community to see what it looked like.
ggplot(df_plot_takeVax, aes(x = Q35.take_vaccine., fill = Q84.community)) +
geom_bar(position = "dodge")
Then, I decided I wanted to facet by community, and have side by side bars for age.
ggplot(df_plot_takeVax, aes(x = Q35.take_vaccine., fill = Q40.age)) +
geom_bar(position = "dodge") +
facet_wrap(~Q84.community) +
theme_classic() +
xlab("Willingness to take vaccine")
This is not the most beautiful plot (i.e., I wouldn’t put it in publication), but for a cursory exploration of your data, it will do! I thought this would be the easiest for comparison. However, depending on what comparisons you are interested in, you could have plotted community side-by-side and faceted by age range. Or you could have picked only a few key age ranges of interest to look at.
df_plot_ed <- covid_clean %>%
# Remove education levels that have less than 10 people
group_by(Q74.education) %>%
filter(n() > 10) %>%
ungroup() %>%
# Make education into a factor, and order the levels by ed level
mutate(Q74.education = factor(Q74.education, levels = c("highschool graduate", "some college", "2 year degree", " 4 year degree", "professional degree", "doctorate")))
ggplot(df_plot_ed, aes(x = Q74.education,
y = Q16.Belief_scientists_understand_covid,
fill = Q74.education)) +
geom_violin() +
theme_classic() +
xlab("Education") +
ylab("Beleif that scientists understand covid") +
theme(legend.position = "none")