Machine: MELAW - Windows 10 x64
Random Seed Set To: 4591
Welcome to Week 12! Previously, we learned how to compare two groups using t-tests. But what happens when we have three, four, or ten groups?
If we simply ran multiple t-tests for every pair of groups, we would run into the Familywise Error Rate (FWER) problem. Every test carries a 5% risk of a False Positive (Type I Error, \(\alpha = 0.05\)).
Mathematically, the probability of not making a Type I error in a single test is \(1 - 0.05 = 0.95\). If we run \(m\) independent tests, the probability of not making any error across all of them is \((0.95)^m\).
Therefore, the probability of making at least one error (the Familywise Error Rate) is:
\[ P(\text{Familywise Error}) = 1 - (0.95)^m \]
If you run 3 tests (\(m=3\)), your risk isn’t 5% anymore—it’s nearly 14%:
\[ 1 - (0.95)^3 = 1 - 0.857 \approx 0.143 \quad (14.3\%) \]
Let’s look at a real-world example of how probabilities accumulate: The H1B Visa Lottery.
In 2024 (for the FY 2025 cap), the selection rate was approximately 25.6%. This means in any single year, you have a roughly 74.4% chance of not getting selected.
If you apply 3 times (e.g., over your 3-year OPT period), what is the chance you get selected at least once? It is not simply \(25.6\% \times 3 = 76.8\%\).
Using the same probability logic as above:
So, even with 3 attempts, your overall success rate is about 58.8%.
This is the inverse of the FWER problem. In FWER, we want to avoid making an error (a “false success”), but the more tests we run, the more likely we are to make one. In the lottery, we want a success, and the more tries we make, the more likely we are to get one—but it never reaches 100%.
ANOVA (Analysis of Variance) solves the FWER problem by comparing all means simultaneously in a single “Omnibus” test. It relies on the F-Ratio [1].
ez package.We will use rio for data import, ez for
simple ANOVA execution, MOTE for effect sizes, and
ggplot2 for visualization.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, rio, ez, MOTE, ggplot2, gridExtra)
We will use a classic dataset from Andy Field’s textbook [1]. The research question is simple: Does Viagra affect libido?
# Import data
master <- import("data/viagra.sav")
# Convert 'dose' to a factor with labels
master$dose <- factor(master$dose,
levels = c(1, 2, 3),
labels = c("Placebo", "Low Dose", "High Dose"))
# Create a participant ID (required for ezANOVA)
master$partno <- 1:nrow(master)
# Calculate Grand Mean and Group Means for visualization
grand_mean <- mean(master$libido)
group_means <- master %>%
group_by(dose) %>%
summarize(mean_libido = mean(libido))
# Merge group means back into master for plotting
master <- master %>%
left_join(group_means, by = "dose")
head(master)
To understand ANOVA, we need to understand Variance. Variance is just the average distance of data points from the mean. ANOVA splits this variance into two parts: “Good Variance” (explained by our experiment) and “Bad Variance” (unexplained noise).
We calculate three types of “Sum of Squares” (SS) to measure this [1].
First, we ignore the groups. We just ask: “How spread out is the data overall?” We measure the distance from Every Data Point to the Grand Mean (the average of everyone).
\[ SST = \sum (x_i - \bar{x}_{grand})^2 \]
Now we use our model (the groups). If our experiment worked, the groups should be different. We measure the distance from the Group Means to the Grand Mean. This represents the Signal: How much better is our model (using groups) than just guessing the average?
\[ SSM = \sum n_k (\bar{x}_{k} - \bar{x}_{grand})^2 \]
Finally, we look at what we couldn’t explain. Even within the “High Dose” group, not everyone has the same libido. We measure the distance from Every Data Point to its Group Mean. This represents the Noise or Error.
\[ SSR = \sum (x_{ik} - \bar{x}_{k})^2 \]
The F-Ratio is simply the ratio of the “Good Variance” (Signal) to the “Bad Variance” (Noise) [1].
\[ F = \frac{\text{Average SSM (Signal)}}{\text{Average SSR (Noise)}} \]
If \(F > 1\), it means the experimental effect is bigger than random noise.
We use the ezANOVA function from the ez
package. It is popular because it automatically provides assumption
checks and formats the output nicely.
Key Arguments: * dv: Dependent Variable
(Outcome) * between: Between-subjects predictor (Group) *
wid: Within-subject ID (Participant ID) *
type: Sum of Squares type (Type 3 is standard for social
sciences)
options(scipen = 20) # Turn off scientific notation
anova_model <- ezANOVA(data = master,
dv = libido,
between = dose,
wid = partno,
type = 3,
detailed = TRUE)
print(anova_model)
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 (Intercept) 1 12 180.26667 23.6 91.661017 0.0000005720565 * 0.8842381
## 2 dose 2 12 20.13333 23.6 5.118644 0.0246942895382 * 0.4603659
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 2 12 0.1333333 6.8 0.1176471 0.8900225
Before looking at the main result, we must check if the groups have equal variances (Homogeneity of Variance).
levene_results <- anova_model$`Levene's Test for Homogeneity of Variance`
print(levene_results)
## DFn DFd SSn SSd F p p<.05
## 1 2 12 0.1333333 6.8 0.1176471 0.8900225
levene_p <- levene_results$p
Now we look at the ANOVA table.
anova_table <- anova_model$ANOVA
print(anova_table)
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 (Intercept) 1 12 180.26667 23.6 91.661017 0.0000005720565 * 0.8842381
## 2 dose 2 12 20.13333 23.6 5.118644 0.0246942895382 * 0.4603659
# Extract values for reporting
# ezANOVA detailed=TRUE includes Intercept, so we filter for "dose"
dose_row <- anova_table[anova_table$Effect == "dose", ]
df_model <- as.numeric(dose_row$DFn)
df_error <- as.numeric(dose_row$DFd)
f_value <- as.numeric(dose_row$F)
p_value <- as.numeric(dose_row$p)
The ANOVA is an “Omnibus” test—it tells us that there is a difference, but not where. To find out which specific groups differ (e.g., High Dose vs. Placebo), we run Post Hoc Tests.
We use the Bonferroni correction to adjust our p-values and keep the Familywise Error Rate at 5%.
posthoc <- pairwise.t.test(master$libido,
master$dose,
p.adjust.method = "bonferroni",
paired = FALSE,
var.equal = TRUE)
print(posthoc)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: master$libido and master$dose
##
## Placebo Low Dose
## Low Dose 0.845 -
## High Dose 0.025 0.196
##
## P value adjustment method: bonferroni
# Extract p-values (The matrix has rows: Low Dose, High Dose; cols: Placebo, Low Dose)
p_high_vs_placebo <- posthoc$p.value["High Dose", "Placebo"]
p_low_vs_placebo <- posthoc$p.value["Low Dose", "Placebo"]
p_high_vs_low <- posthoc$p.value["High Dose", "Low Dose"]
Interpretation: * High Dose vs. Placebo: \(p = 0.025\) (Significant!) * Low Dose vs. Placebo: \(p = 0.845\) (Not significant) * High Dose vs. Low Dose: \(p = 0.196\) (Not significant)
Conclusion: High Dose Viagra significantly increases libido compared to Placebo, but Low Dose does not.
Statistical significance (\(p < .05\)) depends on sample size. To know how large the effect is, we calculate Omega Squared (\(\omega^2\)).
# Calculate Omega Squared using dynamic values extracted earlier
# df_model = `r df_model`
# df_error = `r df_error`
# f_value = `r f_value`
effect <- omega.F(dfm = df_model, dfe = df_error, Fvalue = f_value, n = nrow(master)/3, a = .05)
omega_sq <- effect$omega
When our groups have a natural order (e.g., Placebo < Low Dose < High Dose), we can ask a more specific question than just “Are they different?” We can ask: “What is the shape of the relationship?”
This is called Trend Analysis. We typically test for two main shapes:
# Create a copy for trend analysis
master$dose_trend <- master$dose
# Set polynomial contrasts (Linear and Quadratic)
contrasts(master$dose_trend) <- contr.poly(3)
# Run Analysis of Variance (aov is base R function)
trend_model <- aov(libido ~ dose_trend, data = master)
trend_summary <- summary.lm(trend_model)
trend_summary
##
## Call:
## aov(formula = libido ~ dose_trend, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0 -1.2 -0.2 0.9 2.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4667 0.3621 9.574 0.000000572 ***
## dose_trend.L 1.9799 0.6272 3.157 0.00827 **
## dose_trend.Q 0.3266 0.6272 0.521 0.61201
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.402 on 12 degrees of freedom
## Multiple R-squared: 0.4604, Adjusted R-squared: 0.3704
## F-statistic: 5.119 on 2 and 12 DF, p-value: 0.02469
# Extract P-values for dynamic reporting
p_linear <- trend_summary$coefficients["dose_trend.L", "Pr(>|t|)"]
p_quad <- trend_summary$coefficients["dose_trend.Q", "Pr(>|t|)"]
Why does this matter? If the Quadratic Trend was significant, it would tell us something important: maybe “High Dose” isn’t actually better than “Low Dose” (maybe it causes side effects that lower libido). But here, the Linear Trend wins: More is Better (statistically speaking, for this dataset!).
Finally, we will create a publication-quality visualization. Instead of just showing the means (bars), we should show the distribution of the data.
ggplot(master, aes(x = dose, y = libido, fill = dose)) +
# 1. Boxplot with transparency (alpha) so we can see points
geom_boxplot(alpha = 0.4, outlier.shape = NA, width = 0.5) +
# 2. Add Jittered Points (Individual Participants)
# Jitter adds random noise to x-axis so points don't overlap
geom_jitter(width = 0.1, size = 3, alpha = 0.7) +
# 3. Add Mean Points (White Diamonds)
stat_summary(fun = mean, geom = "point", shape = 23, size = 4,
fill = "white", color = "black", stroke = 1.5) +
# 4. Make it colorful and clean
scale_fill_viridis_d(option = "plasma", begin = 0.3, end = 0.9) +
theme_minimal() +
labs(x = "Dosage Level",
y = "Libido Score",
title = "Impact of Viagra on Libido",
subtitle = "Comparison of Placebo, Low Dose, and High Dose groups",
caption = "White diamonds = Group Means; Dots = Individual Participants") +
theme(legend.position = "none",
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12))
This plot tells the whole story of our analysis:
In this lab, we performed a complete One-Way ANOVA workflow:
[1] A. Field, J. Miles, and Z. Field, “Comparing Several Means: ANOVA (GLM 1),” in Discovering Statistics Using R. London: SAGE Publications, 2012, ch. 10.