Machine: MEL_DEL - Windows 10 x64
Random Seed Set To: 4591
Welcome to Week 10! In previous weeks, we explored how to predict an outcome from one or more variables (Regression). This week, we dive deeper into the nature of those relationships. specifically, we will answer two advanced questions:
These concepts allow us to build much more sophisticated models of the world, moving beyond simple “X affects Y” statements to understanding mechanisms and boundary conditions.
ggplot2.We will use a specialized package MeMoBootR created for
educational purposes to streamline mediation and moderation. We also use
standard packages like rio and ggplot2.
# Ensure CRAN mirror is set
if(is.null(getOption("repos"))) options(repos = c(CRAN = "https://cloud.r-project.org"))
# Install/Load Packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, rio, car, ggplot2, devtools)
# Install MeMoBootR from GitHub if not present
if (!requireNamespace("MeMoBootR", quietly = TRUE)) {
message("Installing MeMoBootR from GitHub...")
tryCatch({
devtools::install_github("doomlab/MeMoBootR", force = TRUE, upgrade = "never", quiet = TRUE)
}, error = function(e) {
warning("Failed to install MeMoBootR. Some advanced functions may not work.")
})
}
# Load MeMoBootR if available
if (require("MeMoBootR", quietly = TRUE)) {
message("MeMoBootR loaded successfully.")
} else {
message("Using fallback methods where possible.")
}
Mediation refers to a situation where the relationship between a predictor variable (\(X\)) and an outcome variable (\(Y\)) can be explained by their relationship to a third variable, the Mediator (\(M\)).
Plain English: Mediation is like a “middle man” or a relay race. The predictor passes the baton to the mediator, who then runs to the finish line (the outcome). If the predictor (X) affects the outcome (Y), mediation explains how or why that happens.
Example: Does stress cause illness? Maybe stress (\(X\)) lowers your immune system (\(M\)), which then leads to illness (\(Y\)). The immune system mediates the relationship.
Historically, mediation was tested using a 4-step process known as the Baron & Kenny method. While modern statistics have moved on to bootstrapping, understanding these steps is crucial for grasping the concept.
State of the Art (2026): While Baron & Kenny’s steps are great for understanding the concept, they are no longer considered sufficient for testing mediation in modern analytics. They have low statistical power. Today, we focus on the Indirect Effect itself, usually tested via Bootstrapping.
Let’s test if Facebook use mediates the relationship between Previous Knowledge and Exam Scores. The theory: People with less previous knowledge might use Facebook more (distraction), which in turn lowers their exam scores.
# Import data
med_data <- import("data/mediation.sav")
# Clean missing data (Mediation requires complete data for comparison)
med_data <- na.omit(med_data)
head(med_data)
Does Previous Knowledge (\(X\)) predict Exam Scores (\(Y\))?
model1 <- lm(exam ~ previous, data = med_data)
summary(model1)
##
## Call:
## lm(formula = exam ~ previous, data = med_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9092 -0.6477 -0.6477 0.3523 8.4062
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.33228 0.16749 7.955 7.37e-14 ***
## previous 0.31539 0.08689 3.630 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.496 on 237 degrees of freedom
## Multiple R-squared: 0.05266, Adjusted R-squared: 0.04867
## F-statistic: 13.17 on 1 and 237 DF, p-value: 0.0003476
Result: Yes, \(b = 0.32, p < .001\).
Does Previous Knowledge (\(X\)) predict Facebook Use (\(M\))?
model2 <- lm(facebook ~ previous, data = med_data)
summary(model2)
##
## Call:
## lm(formula = facebook ~ previous, data = med_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44609 -0.44609 0.05391 0.55391 1.35640
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.28817 0.08195 52.330 <2e-16 ***
## previous -0.09208 0.04251 -2.166 0.0313 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.732 on 237 degrees of freedom
## Multiple R-squared: 0.01941, Adjusted R-squared: 0.01527
## F-statistic: 4.692 on 1 and 237 DF, p-value: 0.03131
Result: Yes, \(b = -0.09, p = .031\).
Does Facebook Use (\(M\)) predict Exam Scores (\(Y\)), controlling for Knowledge? And does Knowledge still predict Exam Scores?
model3 <- lm(exam ~ previous + facebook, data = med_data)
summary(model3)
##
## Call:
## lm(formula = exam ~ previous + facebook, data = med_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6065 -0.7666 -0.3117 0.3850 7.9999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.93294 0.56791 6.925 4.09e-11 ***
## previous 0.25954 0.08397 3.091 0.00224 **
## facebook -0.60647 0.12705 -4.773 3.18e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.432 on 236 degrees of freedom
## Multiple R-squared: 0.1361, Adjusted R-squared: 0.1288
## F-statistic: 18.59 on 2 and 236 DF, p-value: 3.194e-08
Result: - Path b (Facebook): Significant (\(b = -0.61, p < .001\)). - Path c’ (Knowledge): Still significant, but the coefficient dropped from 0.32 to 0.26.
Instead of guessing if the drop from 0.32 to 0.26 is “big enough”, we test the Indirect Effect (\(a \times b\)) directly. We use Bootstrapping, which resamples our data thousands of times to build a confidence interval. This method does not assume normality, making it the gold standard (Field et al., 2012).
if (exists("mediation1")) {
med_results <- mediation1(y = "exam",
x = "previous",
m = "facebook",
df = med_data)
# Print Bootstrapped Results
print(med_results$boot.results)
print(med_results$boot.ci)
} else {
message("MeMoBootR needed for automated bootstrapping.")
}
Interpretation: If the 95% Confidence Interval (CI) for the indirect effect does not include zero, we have significant mediation. - Example Result: Effect = 0.06, 95% CI [0.01, 0.12]. - Since 0 is not in the interval, mediation is significant.
Moderation occurs when the relationship between two variables depends on a third variable. In statistics, this is called an Interaction Effect.
Plain English: Moderation is the “It Depends” variable. If I ask, “Does eating pizza make you happy?”, you might say, “It depends on if I’m hungry.” Hunger is the moderator.
- In statistics, we say the relationship between \(X\) and \(Y\) changes depending on the level of \(Z\).
- Graphically, this looks like non-parallel lines. If the lines cross or fan out, you have an interaction.
Do violent video games (\(X\)) make people aggressive (\(Y\))? Maybe it depends on their level of Callous-Unemotional traits (\(Z\)).
mod_data <- import("data/moderation.sav")
head(mod_data)
Before running a moderation analysis, we must Center our continuous predictors.
Plain English: Why do we center?
- Interpretation: In regression, the Intercept is the value of \(Y\) when \(X\) is 0. If \(X\) is “Heart Rate”, 0 is dead. That’s not useful. If we center \(X\) (subtract the mean), then 0 becomes the “Average Heart Rate”. Now the intercept is the score for the average person.
- Multicollinearity: \(X\) and the interaction term (\(X \times Z\)) are highly correlated because \(X\) is inside the interaction. Centering breaks this correlation, making the math more stable.
# Center predictors (Scale = FALSE keeps original units, just shifts mean to 0)
mod_data$zvid <- scale(mod_data$Vid_Games, scale = FALSE)
mod_data$zcal <- scale(mod_data$CaUnTs, scale = FALSE)
We test moderation by multiplying the predictors:
Y ~ X * Z.
# R automatically creates the interaction term with the * operator
mod_model <- lm(Aggression ~ zvid * zcal, data = mod_data)
summary(mod_model)
##
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = mod_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.7144 -6.9087 -0.1923 6.9036 29.2290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.967108 0.475057 84.131 < 2e-16 ***
## zvid 0.169625 0.068473 2.477 0.013616 *
## zcal 0.760093 0.049458 15.368 < 2e-16 ***
## zvid:zcal 0.027062 0.006981 3.877 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared: 0.3773, Adjusted R-squared: 0.373
## F-statistic: 88.46 on 3 and 438 DF, p-value: < 2.2e-16
Interpretation: - Look at the zvid:zcal
interaction term. - If \(p < .05\),
the relationship between Video Games and Aggression
significantly changes depending on Callousness. - In
our example, if significant (\(b = 0.03, p
< .001\)), we proceed to Simple Slopes
Analysis.
To understand how the relationship changes, we look at the slope of \(X \rightarrow Y\) at three levels of the moderator: 1. Low Callousness (-1 SD) 2. Average Callousness (Mean) 3. High Callousness (+1 SD)
# Create High and Low versions of the moderator
# Note: To test "Low" (-1SD), we ADD 1SD to bring low people to 0 (the intercept).
# To test "High" (+1SD), we SUBTRACT 1SD to bring high people to 0.
sd_cal <- sd(mod_data$zcal)
mod_data$zcal_low <- mod_data$zcal + sd_cal
mod_data$zcal_high <- mod_data$zcal - sd_cal
# Run models for simple slopes
model_low <- lm(Aggression ~ zvid * zcal_low, data = mod_data)
model_high <- lm(Aggression ~ zvid * zcal_high, data = mod_data)
# Extract slopes (Coefficient for zvid)
cat("Slope at Low Callousness:\n")
## Slope at Low Callousness:
print(coef(summary(model_low))["zvid", ])
## Estimate Std. Error t value Pr(>|t|)
## -0.09065113 0.09910510 -0.91469685 0.36085407
cat("\nSlope at Average Callousness:\n")
##
## Slope at Average Callousness:
print(coef(summary(mod_model))["zvid", ])
## Estimate Std. Error t value Pr(>|t|)
## 0.16962470 0.06847268 2.47726105 0.01361583
cat("\nSlope at High Callousness:\n")
##
## Slope at High Callousness:
print(coef(summary(model_high))["zvid", ])
## Estimate Std. Error t value Pr(>|t|)
## 4.299005e-01 9.257749e-02 4.643683e+00 4.531261e-06
The best way to communicate moderation is a graph.
# Create a base plot
ggplot(mod_data, aes(x = zvid, y = Aggression)) +
geom_point(color = "grey", alpha = 0.5) +
theme_minimal() +
labs(title = "Interaction of Video Games and Callousness on Aggression",
x = "Video Game Hours (Centered)", y = "Aggression Score") +
# Add lines for each level
geom_abline(intercept = coef(model_low)[1], slope = coef(model_low)[2],
color = "blue", linetype = "dashed", size = 1) +
geom_abline(intercept = coef(mod_model)[1], slope = coef(mod_model)[2],
color = "black", size = 1) +
geom_abline(intercept = coef(model_high)[1], slope = coef(model_high)[2],
color = "red", linetype = "dotted", size = 1) +
# Add annotation
annotate("text", x = 10, y = 50, label = "High Callousness (Red)", color = "red") +
annotate("text", x = 10, y = 30, label = "Low Callousness (Blue)", color = "blue")
In this tutorial, we moved beyond simple prediction to understanding mechanisms and contexts.
Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: SAGE Publications.