Week 10: R for Data Analytics - Mediation and Moderation

Machine: MEL_DEL - Windows 10 x64

Random Seed Set To: 4591

Introduction

Welcome to Week 10! In previous weeks, we explored how to predict an outcome from one or more variables (Regression). This week, we dive deeper into the nature of those relationships. specifically, we will answer two advanced questions:

“How” does it work? (Mediation)
“When” does it work? (Moderation)

These concepts allow us to build much more sophisticated models of the world, moving beyond simple “X affects Y” statements to understanding mechanisms and boundary conditions.

Learning Objectives

Understand the conceptual difference between Mediation and Moderation.
Conduct a Mediation analysis using the classic Baron & Kenny method (for understanding).
Perform a modern Mediation analysis using Bootstrapping (for results).
Understand the “It Depends” nature of Moderation (Interaction effects).
Learn why and how to Center variables.
Probe interaction effects using Simple Slopes analysis.
Visualize complex relationships using ggplot2.

Required Packages

We will use a specialized package MeMoBootR created for educational purposes to streamline mediation and moderation. We also use standard packages like rio and ggplot2.

# Ensure CRAN mirror is set
if(is.null(getOption("repos"))) options(repos = c(CRAN = "https://cloud.r-project.org"))

# Install/Load Packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, rio, car, ggplot2, devtools)

# Install MeMoBootR from GitHub if not present
if (!requireNamespace("MeMoBootR", quietly = TRUE)) {
  message("Installing MeMoBootR from GitHub...")
  tryCatch({
    devtools::install_github("doomlab/MeMoBootR", force = TRUE, upgrade = "never", quiet = TRUE)
  }, error = function(e) {
    warning("Failed to install MeMoBootR. Some advanced functions may not work.")
  })
}

# Load MeMoBootR if available
if (require("MeMoBootR", quietly = TRUE)) {
  message("MeMoBootR loaded successfully.")
} else {
  message("Using fallback methods where possible.")
}

Part 1: Mediation (“The Middle Man”)

What is Mediation?

Mediation refers to a situation where the relationship between a predictor variable (\(X\)) and an outcome variable (\(Y\)) can be explained by their relationship to a third variable, the Mediator (\(M\)).

Plain English: Mediation is like a “middle man” or a relay race. The predictor passes the baton to the mediator, who then runs to the finish line (the outcome). If the predictor (X) affects the outcome (Y), mediation explains how or why that happens.

Example: Does stress cause illness? Maybe stress (\(X\)) lowers your immune system (\(M\)), which then leads to illness (\(Y\)). The immune system mediates the relationship.

The Classic Approach: Baron & Kenny (1986)

Historically, mediation was tested using a 4-step process known as the Baron & Kenny method. While modern statistics have moved on to bootstrapping, understanding these steps is crucial for grasping the concept.

Path c: \(X\) predicts \(Y\) significantly.
Path a: \(X\) predicts \(M\) significantly.
Path b: \(M\) predicts \(Y\) significantly (controlling for \(X\)).
Path c’: The relationship between \(X\) and \(Y\) drops (or becomes non-significant) when \(M\) is in the model.

State of the Art (2026): While Baron & Kenny’s steps are great for understanding the concept, they are no longer considered sufficient for testing mediation in modern analytics. They have low statistical power. Today, we focus on the Indirect Effect itself, usually tested via Bootstrapping.

Example: Facebook, Knowledge, and Exam Scores

Let’s test if Facebook use mediates the relationship between Previous Knowledge and Exam Scores. The theory: People with less previous knowledge might use Facebook more (distraction), which in turn lowers their exam scores.

# Import data
med_data <- import("data/mediation.sav")

# Clean missing data (Mediation requires complete data for comparison)
med_data <- na.omit(med_data)
head(med_data)

Step 1: Path c (Total Effect)

Does Previous Knowledge (\(X\)) predict Exam Scores (\(Y\))?

model1 <- lm(exam ~ previous, data = med_data)
summary(model1)

## 
## Call:
## lm(formula = exam ~ previous, data = med_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9092 -0.6477 -0.6477  0.3523  8.4062 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.33228    0.16749   7.955 7.37e-14 ***
## previous     0.31539    0.08689   3.630 0.000348 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.496 on 237 degrees of freedom
## Multiple R-squared:  0.05266,    Adjusted R-squared:  0.04867 
## F-statistic: 13.17 on 1 and 237 DF,  p-value: 0.0003476

Result: Yes, \(b = 0.32, p < .001\).

Step 2: Path a

Does Previous Knowledge (\(X\)) predict Facebook Use (\(M\))?

model2 <- lm(facebook ~ previous, data = med_data)
summary(model2)

## 
## Call:
## lm(formula = facebook ~ previous, data = med_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44609 -0.44609  0.05391  0.55391  1.35640 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.28817    0.08195  52.330   <2e-16 ***
## previous    -0.09208    0.04251  -2.166   0.0313 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.732 on 237 degrees of freedom
## Multiple R-squared:  0.01941,    Adjusted R-squared:  0.01527 
## F-statistic: 4.692 on 1 and 237 DF,  p-value: 0.03131

Result: Yes, \(b = -0.09, p = .031\).

Step 3 & 4: Path b and c’ (Direct Effect)

Does Facebook Use (\(M\)) predict Exam Scores (\(Y\)), controlling for Knowledge? And does Knowledge still predict Exam Scores?

model3 <- lm(exam ~ previous + facebook, data = med_data)
summary(model3)

## 
## Call:
## lm(formula = exam ~ previous + facebook, data = med_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6065 -0.7666 -0.3117  0.3850  7.9999 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.93294    0.56791   6.925 4.09e-11 ***
## previous     0.25954    0.08397   3.091  0.00224 ** 
## facebook    -0.60647    0.12705  -4.773 3.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.432 on 236 degrees of freedom
## Multiple R-squared:  0.1361, Adjusted R-squared:  0.1288 
## F-statistic: 18.59 on 2 and 236 DF,  p-value: 3.194e-08

Result: - Path b (Facebook): Significant (\(b = -0.61, p < .001\)). - Path c’ (Knowledge): Still significant, but the coefficient dropped from 0.32 to 0.26.

The Modern Approach: Bootstrapping

Instead of guessing if the drop from 0.32 to 0.26 is “big enough”, we test the Indirect Effect (\(a \times b\)) directly. We use Bootstrapping, which resamples our data thousands of times to build a confidence interval. This method does not assume normality, making it the gold standard (Field et al., 2012).

if (exists("mediation1")) {
  med_results <- mediation1(y = "exam",
                            x = "previous",
                            m = "facebook",
                            df = med_data)
  
  # Print Bootstrapped Results
  print(med_results$boot.results)
  print(med_results$boot.ci)
} else {
  message("MeMoBootR needed for automated bootstrapping.")
}

Interpretation: If the 95% Confidence Interval (CI) for the indirect effect does not include zero, we have significant mediation. - Example Result: Effect = 0.06, 95% CI [0.01, 0.12]. - Since 0 is not in the interval, mediation is significant.

Part 2: Moderation (“It Depends”)

What is Moderation?

Moderation occurs when the relationship between two variables depends on a third variable. In statistics, this is called an Interaction Effect.

Plain English: Moderation is the “It Depends” variable. If I ask, “Does eating pizza make you happy?”, you might say, “It depends on if I’m hungry.” Hunger is the moderator.

In statistics, we say the relationship between \(X\) and \(Y\) changes depending on the level of \(Z\).

Graphically, this looks like non-parallel lines. If the lines cross or fan out, you have an interaction.

Example: Video Games, Callousness, and Aggression

Do violent video games (\(X\)) make people aggressive (\(Y\))? Maybe it depends on their level of Callous-Unemotional traits (\(Z\)).

mod_data <- import("data/moderation.sav")
head(mod_data)

Centering Variables

Before running a moderation analysis, we must Center our continuous predictors.

Plain English: Why do we center?

Interpretation: In regression, the Intercept is the value of \(Y\) when \(X\) is 0. If \(X\) is “Heart Rate”, 0 is dead. That’s not useful. If we center \(X\) (subtract the mean), then 0 becomes the “Average Heart Rate”. Now the intercept is the score for the average person.

Multicollinearity: \(X\) and the interaction term (\(X \times Z\)) are highly correlated because \(X\) is inside the interaction. Centering breaks this correlation, making the math more stable.

# Center predictors (Scale = FALSE keeps original units, just shifts mean to 0)
mod_data$zvid <- scale(mod_data$Vid_Games, scale = FALSE)
mod_data$zcal <- scale(mod_data$CaUnTs, scale = FALSE)

Running the Interaction Model

We test moderation by multiplying the predictors: Y ~ X * Z.

# R automatically creates the interaction term with the * operator
mod_model <- lm(Aggression ~ zvid * zcal, data = mod_data)
summary(mod_model)

## 
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = mod_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.967108   0.475057  84.131  < 2e-16 ***
## zvid         0.169625   0.068473   2.477 0.013616 *  
## zcal         0.760093   0.049458  15.368  < 2e-16 ***
## zvid:zcal    0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Interpretation: - Look at the zvid:zcal interaction term. - If \(p < .05\), the relationship between Video Games and Aggression significantly changes depending on Callousness. - In our example, if significant (\(b = 0.03, p < .001\)), we proceed to Simple Slopes Analysis.

Simple Slopes Analysis

To understand how the relationship changes, we look at the slope of \(X \rightarrow Y\) at three levels of the moderator: 1. Low Callousness (-1 SD) 2. Average Callousness (Mean) 3. High Callousness (+1 SD)

# Create High and Low versions of the moderator
# Note: To test "Low" (-1SD), we ADD 1SD to bring low people to 0 (the intercept).
# To test "High" (+1SD), we SUBTRACT 1SD to bring high people to 0.
sd_cal <- sd(mod_data$zcal)

mod_data$zcal_low <- mod_data$zcal + sd_cal
mod_data$zcal_high <- mod_data$zcal - sd_cal

# Run models for simple slopes
model_low <- lm(Aggression ~ zvid * zcal_low, data = mod_data)
model_high <- lm(Aggression ~ zvid * zcal_high, data = mod_data)

# Extract slopes (Coefficient for zvid)
cat("Slope at Low Callousness:\n")

## Slope at Low Callousness:

print(coef(summary(model_low))["zvid", ])

##    Estimate  Std. Error     t value    Pr(>|t|) 
## -0.09065113  0.09910510 -0.91469685  0.36085407

cat("\nSlope at Average Callousness:\n")

## 
## Slope at Average Callousness:

print(coef(summary(mod_model))["zvid", ])

##   Estimate Std. Error    t value   Pr(>|t|) 
## 0.16962470 0.06847268 2.47726105 0.01361583

cat("\nSlope at High Callousness:\n")

## 
## Slope at High Callousness:

print(coef(summary(model_high))["zvid", ])

##     Estimate   Std. Error      t value     Pr(>|t|) 
## 4.299005e-01 9.257749e-02 4.643683e+00 4.531261e-06

Visualizing the Interaction

The best way to communicate moderation is a graph.

# Create a base plot
ggplot(mod_data, aes(x = zvid, y = Aggression)) +
  geom_point(color = "grey", alpha = 0.5) +
  theme_minimal() +
  labs(title = "Interaction of Video Games and Callousness on Aggression",
       x = "Video Game Hours (Centered)", y = "Aggression Score") +
  
  # Add lines for each level
  geom_abline(intercept = coef(model_low)[1], slope = coef(model_low)[2], 
              color = "blue", linetype = "dashed", size = 1) +
  geom_abline(intercept = coef(mod_model)[1], slope = coef(mod_model)[2], 
              color = "black", size = 1) +
  geom_abline(intercept = coef(model_high)[1], slope = coef(model_high)[2], 
              color = "red", linetype = "dotted", size = 1) +
  
  # Add annotation
  annotate("text", x = 10, y = 50, label = "High Callousness (Red)", color = "red") +
  annotate("text", x = 10, y = 30, label = "Low Callousness (Blue)", color = "blue")

Summary

In this tutorial, we moved beyond simple prediction to understanding mechanisms and contexts.

Mediation (Mechanism): Explains why a relationship exists.
- Key Tool: Bootstrapping for the indirect effect.
Moderation (Context): Explains when a relationship is strong or weak.
- Key Tool: Interaction terms and Simple Slopes.
- Crucial Step: Centering variables to interpret the intercept and reduce multicollinearity.

References

Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: SAGE Publications.