Ziyuan Huang
Last Updated: 2026-01-29
Plain English: Mediation is like a “middle man” or a relay race. The predictor passes the baton to the mediator, who then runs to the finish line (the outcome). If the predictor (X) affects the outcome (Y), mediation explains how or why that happens.
Example: Does stress cause illness? Maybe stress (X) lowers your immune system (Mediator), which then leads to illness (Y). The immune system mediates the relationship.
Mediation was historically tested through three regression models:
State of the Art (2026): While Baron & Kenny’s steps are great for understanding the concept, they are no longer considered sufficient for testing mediation in modern analytics. They have low statistical power (high chance of missing a real effect). Today, we focus on the Indirect Effect itself, usually tested via Bootstrapping.
Four conditions of mediation (Classic View):
How much of a reduction in the relationship between the predictor and outcome is necessary to infer mediation?
People tend to look for a change in significance, which can lead to the ‘all or nothing’ thinking that p-values encourage.
Two solutions:
Note: The Sobel test assumes that the indirect effect (ab) is normally distributed. This is often not* true, especially in smaller samples. This makes the Sobel test “conservative” (harder to find significant results).
What is bootstrapping?
Why it’s better: Bootstrapping does not assume the data are normally distributed. It builds a distribution based on your specific data. This makes it the most accurate and powerful method for testing mediation today.
Mediation models usually occur in stages - meaning separate models are run with different IV / DV combinations.
Sometimes people call these “steps”, but do not confuse this with hierarchical regression.
So, what does that do for data screening?
## 'data.frame': 240 obs. of 3 variables:
## $ previous: num 3 2 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "format.spss")= chr "F1.0"
## ..- attr(*, "display_width")= int 14
## $ facebook: num 3.67 5 4 4.5 4.5 ...
## ..- attr(*, "format.spss")= chr "F8.2"
## ..- attr(*, "display_width")= int 14
## $ exam : num 1 1 2 7 6 1 2 1 1 1 ...
## ..- attr(*, "format.spss")= chr "F8.2"
## ..- attr(*, "display_width")= int 10
## previous facebook exam
## Min. :1.000 Min. :1.750 Min. : 1.000
## 1st Qu.:1.000 1st Qu.:3.750 1st Qu.: 1.000
## Median :1.000 Median :4.250 Median : 1.000
## Mean :1.571 Mean :4.143 Mean : 1.825
## 3rd Qu.:2.000 3rd Qu.:4.750 3rd Qu.: 2.000
## Max. :7.000 Max. :5.000 Max. :11.000
## NA's :1
##
## Call:
## lm(formula = exam ~ previous, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9092 -0.6477 -0.6477 0.3523 8.4062
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.33228 0.16749 7.955 7.37e-14 ***
## previous 0.31539 0.08689 3.630 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.496 on 237 degrees of freedom
## Multiple R-squared: 0.05266, Adjusted R-squared: 0.04867
## F-statistic: 13.17 on 1 and 237 DF, p-value: 0.0003476
##
## Call:
## lm(formula = facebook ~ previous, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44609 -0.44609 0.05391 0.55391 1.35640
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.28817 0.08195 52.330 <2e-16 ***
## previous -0.09208 0.04251 -2.166 0.0313 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.732 on 237 degrees of freedom
## Multiple R-squared: 0.01941, Adjusted R-squared: 0.01527
## F-statistic: 4.692 on 1 and 237 DF, p-value: 0.03131
##
## Call:
## lm(formula = exam ~ previous + facebook, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6065 -0.7666 -0.3117 0.3850 7.9999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.93294 0.56791 6.925 4.09e-11 ***
## previous 0.25954 0.08397 3.091 0.00224 **
## facebook -0.60647 0.12705 -4.773 3.18e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.432 on 236 degrees of freedom
## Multiple R-squared: 0.1361, Adjusted R-squared: 0.1288
## F-statistic: 18.59 on 2 and 236 DF, p-value: 3.194e-08
The Aroian Test: \(Z = \frac{a \times b}{\sqrt{b^2 \times SD_a^2 + a^2 \times SD_b^2 + SD_a^2 \times SD_b^2}}\)
If the indirect effect is larger than the error, we would conclude that the addition of the M variable changed the c path.
#aroian sobel
a <- coef(model2)[2]
b <- coef(model3)[3]
SEa <- summary(model2)$coefficients[2,2]
SEb <- summary(model3)$coefficients[3,2]
zscore <- (a*b)/(sqrt((b^2*SEa^2)+(a^2*SEb^2)+(SEa^2*SEb^2)))
zscore## previous
## 1.937494
## previous
## 0.05268497
State of the Art (2026): In professional data science settings, while
MeMoBootRis excellent for learning and quick “all-in-one” analysis, you will often see thelavaanpackage (for Structural Equation Modeling) or themediationpackage used for complex mediation models. These are the industry standards for publication-quality analysis.
# Ensure a CRAN mirror is set
if(is.null(getOption("repos"))) {
options(repos = c(CRAN = "https://cloud.r-project.org"))
}
# Install devtools if missing
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install MeMoBootR if missing
if (!requireNamespace("MeMoBootR", quietly = TRUE)) {
message("Installing MeMoBootR from GitHub...")
tryCatch({
devtools::install_github("doomlab/MeMoBootR", force = TRUE, upgrade = "never", quiet = TRUE)
}, error = function(e) {
warning("Failed to install MeMoBootR: ", e$message)
})
}## Installing MeMoBootR from GitHub...
# Load the package
if (require("MeMoBootR")) {
#no missing data allowed
med_results <- mediation1(y = "exam",
x = "previous",
m = "facebook",
df = master)
} else {
message("MeMoBootR package is not available. Mediation results cannot be generated.")
# Create a dummy object to prevent future errors
med_results <- list(
datascreening = list(fulldata = NULL, correl = NULL, linearity = NULL, normality = NULL, homogen = NULL),
model1 = NULL, model2 = NULL, model3 = NULL,
indirect.effect = NULL, z.score = NULL, p.value = NULL,
boot.results = NULL, boot.ci = NULL, diagram = NULL
)
}## Loading required package: MeMoBootR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'MeMoBootR'
## MeMoBootR package is not available. Mediation results cannot be generated.
## NULL
## NULL
## NULL
## NULL
## NULL
## Length Class Mode
## 0 NULL NULL
## Length Class Mode
## 0 NULL NULL
## Length Class Mode
## 0 NULL NULL
## NULL
## NULL
## NULL
Last, let’s get the bootstrapped results:
## NULL
## NULL
## NULL
Plain English: Moderation is the “It Depends” variable. If I ask, “Does eating pizza make you happy?”, you might say, “It depends on if I’m hungry.” Hunger is the moderator.
- In statistics, we say the relationship between X and Y changes depending on the level of Z.
- Graphically, this looks like non-parallel lines. If the lines cross or fan out, you have an interaction.
Do violent video games make people aggressive?
DV: Aggression
IV:
The interaction term makes the bs for the main predictors uninterpretable in many situations.
Centering refers to the process of transforming a variable into deviations around a fixed point.
Centering solves two problems:
Plain English: Why do we center?
- Interpretation: In regression, the Intercept is the value of Y when X is 0. If X is “Heart Rate”, 0 is dead. That’s not useful. If we center X (subtract the mean), then 0 becomes the “Average Heart Rate”. Now the intercept is the score for the average person.
- Multicollinearity: X and XZ are highly correlated because X is inside XZ. Centering breaks this correlation, making the math more stable.
Easiest way to center to make them useful:
Centered variables for main effects have two interpretations
For continuous variables, you “create” low, average, and high groups.
## 'data.frame': 442 obs. of 4 variables:
## $ ID : num 69 55 7 96 130 124 72 139 102 179 ...
## ..- attr(*, "label")= chr "Case ID"
## ..- attr(*, "format.spss")= chr "F3.0"
## $ Aggression: num 13 38 30 23 25 46 41 22 35 23 ...
## ..- attr(*, "label")= chr "Agression"
## ..- attr(*, "format.spss")= chr "F2.0"
## $ Vid_Games : num 16 12 32 10 11 29 23 15 20 20 ...
## ..- attr(*, "label")= chr "Video Games (Hours per week)"
## ..- attr(*, "format.spss")= chr "F2.0"
## $ CaUnTs : num 0 0 0 1 1 1 2 3 3 3 ...
## ..- attr(*, "label")= chr "Callous Unemotional Traits"
## ..- attr(*, "format.spss")= chr "F2.0"
Center your variables using scale()
For data screening, we would screen the model with the interaction, and all other considerations are the same as regression.
Run the model:
##
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.7144 -6.9087 -0.1923 6.9036 29.2290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.967108 0.475057 84.131 < 2e-16 ***
## zvid 0.169625 0.068473 2.477 0.013616 *
## zcal 0.760093 0.049458 15.368 < 2e-16 ***
## zvid:zcal 0.027062 0.006981 3.877 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared: 0.3773, Adjusted R-squared: 0.373
## F-statistic: 88.46 on 3 and 438 DF, p-value: < 2.2e-16
So, how do we create groups when we have a continuous variable?
Create the low and high versions of the M variable (callousness in this example)
#create the low and high z score variables
master$zcallow <- master$zcal + sd(master$zcal) #bring them up
master$zcalhigh <- master$zcal - sd(master$zcal) #bring them down
summary(master)## ID Aggression Vid_Games CaUnTs
## Min. : 1.0 Min. : 9.00 Min. : 1.00 Min. : 0.0
## 1st Qu.:111.2 1st Qu.:32.00 1st Qu.:17.00 1st Qu.:11.0
## Median :221.5 Median :40.00 Median :22.00 Median :18.0
## Mean :221.5 Mean :40.05 Mean :21.84 Mean :18.6
## 3rd Qu.:331.8 3rd Qu.:48.00 3rd Qu.:26.00 3rd Qu.:26.0
## Max. :442.0 Max. :82.00 Max. :38.00 Max. :43.0
## zvid.V1 zcal.V1
## Min. :-2.08438914027e+01 Min. :-18.595022624400
## 1st Qu.:-4.84389140271e+00 1st Qu.: -7.595022624430
## Median : 1.56108597285e-01 Median : -0.595022624434
## Mean : 1.00000000000e-15 Mean : 0.000000000000
## 3rd Qu.: 4.15610859729e+00 3rd Qu.: 7.404977375570
## Max. : 1.61561085973e+01 Max. : 24.404977375600
## zcallow.V1 zcalhigh.V1
## Min. :-8.97732952676 Min. :-28.21271572210
## 1st Qu.: 2.02267047324 1st Qu.:-17.21271572210
## Median : 9.02267047324 Median :-10.21271572210
## Mean : 9.61769309767 Mean : -9.61769309767
## 3rd Qu.:17.02267047320 3rd Qu.: -2.21271572211
## Max. :34.02267047320 Max. : 14.78728427790
##
## Call:
## lm(formula = Aggression ~ zvid * zcallow, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.7144 -6.9087 -0.1923 6.9036 29.2290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.656771 0.672018 48.595 < 2e-16 ***
## zvid -0.090651 0.099105 -0.915 0.360854
## zcallow 0.760093 0.049458 15.368 < 2e-16 ***
## zvid:zcallow 0.027062 0.006981 3.877 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared: 0.3773, Adjusted R-squared: 0.373
## F-statistic: 88.46 on 3 and 438 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.7144 -6.9087 -0.1923 6.9036 29.2290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.967108 0.475057 84.131 < 2e-16 ***
## zvid 0.169625 0.068473 2.477 0.013616 *
## zcal 0.760093 0.049458 15.368 < 2e-16 ***
## zvid:zcal 0.027062 0.006981 3.877 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared: 0.3773, Adjusted R-squared: 0.373
## F-statistic: 88.46 on 3 and 438 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Aggression ~ zvid * zcalhigh, data = master)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.7144 -6.9087 -0.1923 6.9036 29.2290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.277446 0.672520 70.299 < 2e-16 ***
## zvid 0.429901 0.092577 4.644 4.53e-06 ***
## zcalhigh 0.760093 0.049458 15.368 < 2e-16 ***
## zvid:zcalhigh 0.027062 0.006981 3.877 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared: 0.3773, Adjusted R-squared: 0.373
## F-statistic: 88.46 on 3 and 438 DF, p-value: < 2.2e-16
library(ggplot2)
cleanup <- theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
legend.key = element_rect(fill = "white"),
text = element_text(size = 15))
modgraph <- ggplot(master, aes(zvid, Aggression))
##change Cal to the new moderator label
##change xlab for the new X label
modgraph +
xlab("Centered Video Games") +
geom_point(color = "gray") +
geom_abline(aes(intercept = modmodellow$coefficients[1],
slope = modmodellow$coefficients[2],
linetype = "-1SD Cal"), size = 1) +
geom_abline(aes(intercept = modmodel$coefficients[1],
slope = modmodel$coefficients[2],
linetype = "Average Cal"), size = 1) +
geom_abline(aes(intercept = modmodelhigh$coefficients[1],
slope = modmodelhigh$coefficients[2],
linetype = "+1SD Cal"), size = 1) +
scale_linetype_manual(values = c("dotted", "dashed", "solid"),
breaks = c("-1SD Cal", "Average Cal", "+1SD Cal"),
name = "Simple Slope") +
cleanup MeMoBootR to complete the entire
processing, including data screening for us!## NULL
#mod_model$datascreening$correl
#mod_model$datascreening$linearity
#mod_model$datascreening$normality
#mod_model$datascreening$homogen
#models
summary(mod_model$model1)## Length Class Mode
## 0 NULL NULL
## [1] "MeMoBootR not available."
## NULL
In this lecture, we have covered: