Regression: Mediation and Moderation

Ziyuan Huang

Last Updated: 2026-01-29

What is Mediation?

Plain English: Mediation is like a “middle man” or a relay race. The predictor passes the baton to the mediator, who then runs to the finish line (the outcome). If the predictor (X) affects the outcome (Y), mediation explains how or why that happens.

Example: Does stress cause illness? Maybe stress (X) lowers your immune system (Mediator), which then leads to illness (Y). The immune system mediates the relationship.

Baron & Kenny (1986)

State of the Art (2026): While Baron & Kenny’s steps are great for understanding the concept, they are no longer considered sufficient for testing mediation in modern analytics. They have low statistical power (high chance of missing a real effect). Today, we focus on the Indirect Effect itself, usually tested via Bootstrapping.

Baron & Kenny (1986)

Baron & Kenny (1986): Limitations

Sobel Test

Note: The Sobel test assumes that the indirect effect (ab) is normally distributed. This is often not* true, especially in smaller samples. This makes the Sobel test “conservative” (harder to find significant results).

Bootstrapping (The Gold Standard)

Why it’s better: Bootstrapping does not assume the data are normally distributed. It builds a distribution based on your specific data. This makes it the most accurate and powerful method for testing mediation today.

Mediation: Data Screening

Mediation: Example

library(rio)
master <- import("data/mediation.sav")
str(master)
## 'data.frame':    240 obs. of  3 variables:
##  $ previous: num  3 2 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "format.spss")= chr "F1.0"
##   ..- attr(*, "display_width")= int 14
##  $ facebook: num  3.67 5 4 4.5 4.5 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 14
##  $ exam    : num  1 1 2 7 6 1 2 1 1 1 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 10
summary(master)
##     previous        facebook          exam       
##  Min.   :1.000   Min.   :1.750   Min.   : 1.000  
##  1st Qu.:1.000   1st Qu.:3.750   1st Qu.: 1.000  
##  Median :1.000   Median :4.250   Median : 1.000  
##  Mean   :1.571   Mean   :4.143   Mean   : 1.825  
##  3rd Qu.:2.000   3rd Qu.:4.750   3rd Qu.: 2.000  
##  Max.   :7.000   Max.   :5.000   Max.   :11.000  
##                  NA's   :1
master <- na.omit(master)

Mediation: c Path

model1 <- lm(exam ~ previous, data = master)
summary(model1)
## 
## Call:
## lm(formula = exam ~ previous, data = master)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9092 -0.6477 -0.6477  0.3523  8.4062 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.33228    0.16749   7.955 7.37e-14 ***
## previous     0.31539    0.08689   3.630 0.000348 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.496 on 237 degrees of freedom
## Multiple R-squared:  0.05266,    Adjusted R-squared:  0.04867 
## F-statistic: 13.17 on 1 and 237 DF,  p-value: 0.0003476

Mediation: a Path

model2 <- lm(facebook ~ previous, data = master)
summary(model2)
## 
## Call:
## lm(formula = facebook ~ previous, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44609 -0.44609  0.05391  0.55391  1.35640 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.28817    0.08195  52.330   <2e-16 ***
## previous    -0.09208    0.04251  -2.166   0.0313 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.732 on 237 degrees of freedom
## Multiple R-squared:  0.01941,    Adjusted R-squared:  0.01527 
## F-statistic: 4.692 on 1 and 237 DF,  p-value: 0.03131

Mediation: b, c’ Path

model3 <- lm(exam ~ previous + facebook, data = master)
summary(model3)
## 
## Call:
## lm(formula = exam ~ previous + facebook, data = master)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6065 -0.7666 -0.3117  0.3850  7.9999 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.93294    0.56791   6.925 4.09e-11 ***
## previous     0.25954    0.08397   3.091  0.00224 ** 
## facebook    -0.60647    0.12705  -4.773 3.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.432 on 236 degrees of freedom
## Multiple R-squared:  0.1361, Adjusted R-squared:  0.1288 
## F-statistic: 18.59 on 2 and 236 DF,  p-value: 3.194e-08

Mediation: Interpretation

Mediation: Sobel Test

#aroian sobel
a <- coef(model2)[2]
b <- coef(model3)[3]
SEa <- summary(model2)$coefficients[2,2]
SEb <- summary(model3)$coefficients[3,2]
zscore <- (a*b)/(sqrt((b^2*SEa^2)+(a^2*SEb^2)+(SEa^2*SEb^2)))
zscore
## previous 
## 1.937494
#two tailed test 
pnorm(abs(zscore), lower.tail = F)*2
##   previous 
## 0.05268497

Mediation: Sobel Test

Mediation: All Together + Bootstrapping

State of the Art (2026): In professional data science settings, while MeMoBootR is excellent for learning and quick “all-in-one” analysis, you will often see the lavaan package (for Structural Equation Modeling) or the mediation package used for complex mediation models. These are the industry standards for publication-quality analysis.

# Ensure a CRAN mirror is set
if(is.null(getOption("repos"))) {
  options(repos = c(CRAN = "https://cloud.r-project.org"))
}

# Install devtools if missing
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install MeMoBootR if missing
if (!requireNamespace("MeMoBootR", quietly = TRUE)) {
  message("Installing MeMoBootR from GitHub...")
  tryCatch({
    devtools::install_github("doomlab/MeMoBootR", force = TRUE, upgrade = "never", quiet = TRUE)
  }, error = function(e) {
    warning("Failed to install MeMoBootR: ", e$message)
  })
}
## Installing MeMoBootR from GitHub...
# Load the package
if (require("MeMoBootR")) {
  #no missing data allowed
  med_results <- mediation1(y = "exam",
                            x = "previous", 
                            m = "facebook", 
                            df = master)
} else {
  message("MeMoBootR package is not available. Mediation results cannot be generated.")
  # Create a dummy object to prevent future errors
  med_results <- list(
    datascreening = list(fulldata = NULL, correl = NULL, linearity = NULL, normality = NULL, homogen = NULL),
    model1 = NULL, model2 = NULL, model3 = NULL,
    indirect.effect = NULL, z.score = NULL, p.value = NULL,
    boot.results = NULL, boot.ci = NULL, diagram = NULL
  )
}
## Loading required package: MeMoBootR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'MeMoBootR'
## MeMoBootR package is not available. Mediation results cannot be generated.

Mediation: MeMoBootR

head(med_results$datascreening$fulldata)
## NULL
med_results$datascreening$correl
## NULL
med_results$datascreening$linearity
## NULL
med_results$datascreening$normality
## NULL
med_results$datascreening$homogen
## NULL

Mediation: MeMoBootR

summary(med_results$model1)
## Length  Class   Mode 
##      0   NULL   NULL
summary(med_results$model2)
## Length  Class   Mode 
##      0   NULL   NULL
summary(med_results$model3)
## Length  Class   Mode 
##      0   NULL   NULL

Mediation: MeMoBootR

med_results$indirect.effect
## NULL
med_results$z.score
## NULL
med_results$p.value
## NULL

Mediation: MeMoBootR

med_results$boot.results
## NULL
med_results$boot.ci
## NULL
med_results$diagram
## NULL

Moderation

Plain English: Moderation is the “It Depends” variable. If I ask, “Does eating pizza make you happy?”, you might say, “It depends on if I’m hungry.” Hunger is the moderator.

Conceptual Moderation Model

Conceptual Moderation Model

Conceptual Moderation Model

Statistical Moderation Model

Centering variables

Plain English: Why do we center?

  1. Interpretation: In regression, the Intercept is the value of Y when X is 0. If X is “Heart Rate”, 0 is dead. That’s not useful. If we center X (subtract the mean), then 0 becomes the “Average Heart Rate”. Now the intercept is the score for the average person.
  2. Multicollinearity: X and XZ are highly correlated because X is inside XZ. Centering breaks this correlation, making the math more stable.

Centering variables

Moderation: Simple Slopes

Moderation: Simple Slopes

Moderation: Simple Slopes

master <- import("data/moderation.sav")
str(master)
## 'data.frame':    442 obs. of  4 variables:
##  $ ID        : num  69 55 7 96 130 124 72 139 102 179 ...
##   ..- attr(*, "label")= chr "Case ID"
##   ..- attr(*, "format.spss")= chr "F3.0"
##  $ Aggression: num  13 38 30 23 25 46 41 22 35 23 ...
##   ..- attr(*, "label")= chr "Agression"
##   ..- attr(*, "format.spss")= chr "F2.0"
##  $ Vid_Games : num  16 12 32 10 11 29 23 15 20 20 ...
##   ..- attr(*, "label")= chr "Video Games (Hours per week)"
##   ..- attr(*, "format.spss")= chr "F2.0"
##  $ CaUnTs    : num  0 0 0 1 1 1 2 3 3 3 ...
##   ..- attr(*, "label")= chr "Callous Unemotional Traits"
##   ..- attr(*, "format.spss")= chr "F2.0"

Moderation: Centering

#center the X and M variable (NOT THE DV)
#when you create these use the final dataset 
master$zvid = scale(master$Vid_Games, scale = F) #mean center, not z score
master$zcal = scale(master$CaUnTs, scale = F)

Moderation: Running

#run the model to see if the moderation is significant
#use the z score variables!
#be sure to do X*M so the graph is right 
modmodel <- lm(Aggression ~ zvid*zcal, data = master)

Moderation: Interpretation

summary(modmodel)
## 
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.967108   0.475057  84.131  < 2e-16 ***
## zvid         0.169625   0.068473   2.477 0.013616 *  
## zcal         0.760093   0.049458  15.368  < 2e-16 ***
## zvid:zcal    0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Moderation: Simple Slopes

Moderation: Simple Slopes

#create the low and high z score variables 
master$zcallow <- master$zcal + sd(master$zcal) #bring them up
master$zcalhigh <- master$zcal - sd(master$zcal) #bring them down
summary(master)
##        ID          Aggression      Vid_Games         CaUnTs    
##  Min.   :  1.0   Min.   : 9.00   Min.   : 1.00   Min.   : 0.0  
##  1st Qu.:111.2   1st Qu.:32.00   1st Qu.:17.00   1st Qu.:11.0  
##  Median :221.5   Median :40.00   Median :22.00   Median :18.0  
##  Mean   :221.5   Mean   :40.05   Mean   :21.84   Mean   :18.6  
##  3rd Qu.:331.8   3rd Qu.:48.00   3rd Qu.:26.00   3rd Qu.:26.0  
##  Max.   :442.0   Max.   :82.00   Max.   :38.00   Max.   :43.0  
##            zvid.V1                     zcal.V1          
##  Min.   :-2.08438914027e+01   Min.   :-18.595022624400  
##  1st Qu.:-4.84389140271e+00   1st Qu.: -7.595022624430  
##  Median : 1.56108597285e-01   Median : -0.595022624434  
##  Mean   : 1.00000000000e-15   Mean   :  0.000000000000  
##  3rd Qu.: 4.15610859729e+00   3rd Qu.:  7.404977375570  
##  Max.   : 1.61561085973e+01   Max.   : 24.404977375600  
##         zcallow.V1               zcalhigh.V1       
##  Min.   :-8.97732952676   Min.   :-28.21271572210  
##  1st Qu.: 2.02267047324   1st Qu.:-17.21271572210  
##  Median : 9.02267047324   Median :-10.21271572210  
##  Mean   : 9.61769309767   Mean   : -9.61769309767  
##  3rd Qu.:17.02267047320   3rd Qu.: -2.21271572211  
##  Max.   :34.02267047320   Max.   : 14.78728427790

Moderation: Simple Slopes

#run the models
#be sure to X*M
modmodellow <- lm(Aggression ~ zvid*zcallow, data = master)
modmodelhigh <- lm(Aggression ~ zvid*zcalhigh, data = master)

Moderation: Simple Slopes

summary(modmodellow) #low slope
## 
## Call:
## lm(formula = Aggression ~ zvid * zcallow, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  32.656771   0.672018  48.595  < 2e-16 ***
## zvid         -0.090651   0.099105  -0.915 0.360854    
## zcallow       0.760093   0.049458  15.368  < 2e-16 ***
## zvid:zcallow  0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Moderation: Simple Slopes

summary(modmodel) #average slope
## 
## Call:
## lm(formula = Aggression ~ zvid * zcal, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.967108   0.475057  84.131  < 2e-16 ***
## zvid         0.169625   0.068473   2.477 0.013616 *  
## zcal         0.760093   0.049458  15.368  < 2e-16 ***
## zvid:zcal    0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Moderation: Simple Slopes

summary(modmodelhigh) #high slope
## 
## Call:
## lm(formula = Aggression ~ zvid * zcalhigh, data = master)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   47.277446   0.672520  70.299  < 2e-16 ***
## zvid           0.429901   0.092577   4.644 4.53e-06 ***
## zcalhigh       0.760093   0.049458  15.368  < 2e-16 ***
## zvid:zcalhigh  0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Moderation: Simple Slopes

Moderation: Graphing

library(ggplot2)
cleanup <- theme(panel.grid.major = element_blank(), 
                panel.grid.minor = element_blank(), 
                panel.background = element_blank(), 
                axis.line.x = element_line(color = "black"),
                axis.line.y = element_line(color = "black"),
                legend.key = element_rect(fill = "white"),
                text = element_text(size = 15))

modgraph <- ggplot(master, aes(zvid, Aggression))

##change Cal to the new moderator label
##change xlab for the new X label
modgraph + 
  xlab("Centered Video Games") + 
  geom_point(color = "gray") +
  geom_abline(aes(intercept = modmodellow$coefficients[1],
                  slope = modmodellow$coefficients[2], 
                  linetype = "-1SD Cal"), size = 1) +
  geom_abline(aes(intercept = modmodel$coefficients[1],
                  slope = modmodel$coefficients[2], 
                  linetype = "Average Cal"), size = 1) +
  geom_abline(aes(intercept = modmodelhigh$coefficients[1],
                  slope = modmodelhigh$coefficients[2], 
                  linetype = "+1SD Cal"), size = 1) +
  scale_linetype_manual(values = c("dotted", "dashed", "solid"),
                        breaks = c("-1SD Cal", "Average Cal", "+1SD Cal"),
                        name = "Simple Slope") +
  cleanup 

Moderation: MeMoBootR

if (exists("moderation1")) {
  mod_model <- moderation1(y = "Aggression",
                           x = "Vid_Games",
                           m = "CaUnTs",
                           df = master)
} else {
  mod_model <- list(
    datascreening = list(fulldata = NULL),
    model1 = NULL,
    interpretation = "MeMoBootR not available.",
    graphslopes = NULL
  )
}

Moderation: MeMoBootR

#data screening
head(mod_model$datascreening$fulldata)
## NULL
#mod_model$datascreening$correl
#mod_model$datascreening$linearity
#mod_model$datascreening$normality
#mod_model$datascreening$homogen

#models
summary(mod_model$model1)
## Length  Class   Mode 
##      0   NULL   NULL
#summary(mod_model$model1low)
#summary(mod_model$model1high)
mod_model$interpretation
## [1] "MeMoBootR not available."
#graphs
mod_model$graphslopes
## NULL

Summary