What is Regression?

A way of predicting the value of one variable from other variables.
- It is a hypothetical model of the relationship between two or more variables.
- The model used is a linear one in these notes, doesn’t always have to be!
- Therefore, we describe the relationship using the equation of a straight line.
Field et al. (2012) defines regression as: fitting a model to data and using it to predict values of the dependent variable (DV) from one or more independent variables (IVs)
We extend beyond the data we collected to answer predictive questions

Describing a Straight Line

\[Y_i = b_0 + b_1X_i + \varepsilon_i\]

Key Components:

$Y_i$ = outcome value for person i
$b_0$ = intercept (value of Y when X = 0); the point at which the regression line crosses the Y-axis
$b_1$ = regression coefficient (gradient/slope) of the regression line; the change in Y for each unit change in X
$X_i$ = predictor value for person i
$\varepsilon_i$ = residual (error term); the difference between predicted $\hat{Y_i}$ and observed $Y_i$

All regression coefficients ($b$) are sometimes called: - Gradient (slope) of the regression line - Direction/Strength of Relationship - Unstandardized coefficients (original scale)

Intercepts and Gradients

Types of Regression

Simple Linear Regression = SLR
- One X variable (IV)
Multiple Linear Regression = MLR
- 2 or more X variables (IVs)
- MLR types include:
  - Simultaneous: Everything at once
  - Hierarchical: IVs in steps
  - Stepwise: Statistical regression

Analyzing a Regression

Is my overall model (i.e., the regression equation) useful at predicting the outcome variable?
- Use the model summary, F-test, and $R^2$
How useful are each of the individual predictors for my model?
- Use the coefficients, t-test, and $pr^2$

Overall Model: Understand the NHST

Our overall model uses an F-test, which tests whether the regression model significantly improves prediction compared to the null model (using just the mean).
Hypotheses for the overall test:
- H0: We cannot predict the dependent variable (all regression coefficients = 0; model no better than mean)
- H1: We can predict the dependent variable (at least one regression coefficient ≠ 0; model better than mean)
The F-test is always two-tailed because the test statistic is based on squared terms: \[F = \frac{MSM}{MSR}\] (systematic variance / unsystematic variance)
MSM = Mean Squares Model (improvement from regression line)
MSR = Mean Squares Residual (unexplained error)

Overall Model: What Do We Mean by “Predict”?

The General Linear Model: \[\text{Outcome} = \text{(Model)} + \text{Error}\]
In regression, we predict each person’s score $Y_i$ by:
1. Base prediction ($b_0$): Starting with the intercept (intercept = predicted Y when X = 0)
2. Add predictor influence ($b_1X_i$): Adding the contribution of each predictor variable
3. Account for residual ($\varepsilon_i$): Acknowledging the difference between predicted $\hat{Y_i}$ and observed $Y_i$ (irreducible error)
H0 Model (No Relationship): Prediction using only the mean, resulting in a flat line ($b_1 = 0$)
H1 Model (Relationship Exists): Prediction using the regression line, incorporating predictor slopes

Comparing H0 vs H1 Models

H0 Model (top left): You cannot predict $Y_i$ better than using the mean for each person
- Regression line is flat: $\hat{Y} = b_0$ (intercept only)
- Large residuals, poor fit
H1 Model (top right): You can predict $Y_i$ by including predictor variables
- Regression line has slope: $\hat{Y} = b_0 + b_1X_i$
- Smaller residuals, better fit
- By including predictors, you decrease the error between observed $Y_i$ and predicted $\hat{Y_i}$

Method of Least Squares

Least Squares: A mathematical technique that finds the regression line by minimizing the sum of squared differences between observed and predicted values
- Among all possible lines, we choose the one with least squared error (smallest SSR)
- This ensures the most accurate fit to the data
Comparing Models:
- We use error size (SSR) to determine if our H1 model (with predictors) is better than our H0 model (without)
- Equivalently, we can ask if our $R^2$ value (variance explained) is greater than zero
- Less error = Higher fit = Higher $R^2$

Sums of Squares in Regression

Total Sum of Squares (SST): - Sum of squared differences between observed values and the mean of Y - Represents total variability in the outcome with no predictors (H0 model) - Formula: $SST = \sum(Y_i - \bar{Y})^2$

Residual Sum of Squares (SSR): - Sum of squared differences between observed values and predicted values from regression line - Represents variability unexplained by the model (error remaining) - Formula: $SSR = \sum(Y_i - \hat{Y_i})^2$

Model Sum of Squares (SSM): - Sum of squared differences between predicted values and the mean of Y - Represents variability explained by the model (improvement from adding predictors) - Formula: $SSM = SST - SSR$ or $SSM = \sum(\hat{Y_i} - \bar{Y})^2$

Relationship: $SST = SSM + SSR$ (total variation = explained + unexplained)

Individual Predictors: Understand the NHST

We test the individual predictors with a t-test:
- $t = \frac{b}{SE}$
- Therefore, the model for each individual predictor is our coefficient b.
- Single sample t-test to determine if the b value is different from zero

Individual Predictors: Understand the NHST

Therefore, we might use the following hypotheses:
- H0: X variable does not predict Y.
- H1: X variable does predict Y.
Or, we could use a directional test, since the test statistic t can be negative:
- H0: X variable negatively or does not predict Y (b <= 0).
- H1: X variable positively predicts Y (b > 0).

Individual Predictors: Understand the NHST

Unlike correlation, these statistics are often reported with t(df).
$df = N - k - 1$
- N = total sample size
- k = number of predictors
- Correlation is technically N - 1 - 1 = N - 2
- We can also find this value in our output by looking at the F-statistic.

Individual Predictors: Standardization

Unstandardized Regression Coefficient ($b$): - Regression coefficient in the original scale of the variables - Interpretation: For every one unit increase in X, Y increases by $b$ units - Advantage: More interpretable for your specific problem context - Disadvantage: Hard to compare across predictors with different scales - Formula: Predicted $Y = b_0 + b_1X$

Standardized Regression Coefficient ($\beta$ or “beta”): - Regression coefficient after converting variables to standard deviation units - Interpretation: For every one SD increase in X, Y increases by $\beta$ SDs - Advantage: Comparable across predictors with different scales; indicates relative importance - Disadvantage: Less interpretable for real-world problem context

When to Use Each: - Use $b$ for answering applied questions (“How many more sales per $1000 advertising?”) - Use $\beta$ for comparing predictor importance across variables with different scales

Data Screening

Now we want to look specifically at the residuals for Y, while screening the X variables.
We used a random variable before to check the continuous variable (the DV) to make sure they were randomly distributed.
Now we don’t need the random variable because the residuals for Y should be randomly distributed (and evenly) with the X variable.

Data Screening

Accuracy
Missing
Outliers - (somewhat) new and exciting!
Additivity if you have more than one predictor
Linearity
Normality
Homogeneity
Homoscedasticity

Example: Mental Health

Mental Health and Drug Use:
- CESD = depression measure
- PIL total = measure of meaning in life
- AUDIT total = measure of alcohol use
- DAST total = measure of drug usage

library(rio)
master <- import("data/regression_data.sav")
master <- master[ , c(8:11)]
str(master)

## 'data.frame':    267 obs. of  4 variables:
##  $ PIL_total      : num  121 76 98 122 99 134 102 124 126 112 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 11
##  $ CESD_total     : num  28 37 20 15 7 7 27 10 9 8 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 12
##  $ AUDIT_TOTAL_NEW: num  1 5 3 3 2 3 2 1 1 7 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 17
##  $ DAST_TOTAL_NEW : num  0 0 0 1 0 0 1 0 0 1 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 16

Example: Accuracy, Missing Data

All data is accurate with min and max values.
We only have one missing data point, which we can exclude because with only four columns, we cannot estimate missing data.

summary(master)

##    PIL_total       CESD_total   AUDIT_TOTAL_NEW  DAST_TOTAL_NEW 
##  Min.   : 60.0   Min.   : 0.0   Min.   : 0.000   Min.   :0.000  
##  1st Qu.:103.0   1st Qu.: 7.0   1st Qu.: 2.000   1st Qu.:0.000  
##  Median :111.0   Median :11.0   Median : 5.000   Median :0.000  
##  Mean   :110.7   Mean   :13.2   Mean   : 6.807   Mean   :0.906  
##  3rd Qu.:121.0   3rd Qu.:17.0   3rd Qu.:11.000   3rd Qu.:1.000  
##  Max.   :138.0   Max.   :47.0   Max.   :31.000   Max.   :9.000  
##                                                  NA's   :1

nomiss <- na.omit(master)
nrow(master)

## [1] 267

nrow(nomiss)

## [1] 266

Example: Outliers

In this section, we will add a few new outlier checks:
- Mahalanobis
- Leverage scores
- Cook’s distance
Because we are using regression as our model, we may consider using multiple checks before excluding outliers.

Example: Mahalanobis

The mahalanobis() function we have used previously.
Since we are going to use multiple criteria, we are going to save if they are an outlier or not.
The table tells us: 0 (not outliers) and 1 (considered an outlier) for just Mahalanobis values.

mahal <- mahalanobis(nomiss, 
                    colMeans(nomiss), 
                    cov(nomiss))
cutmahal <- qchisq(1-.001, ncol(nomiss))
badmahal <- as.numeric(mahal > cutmahal) ##note the direction of the > 
table(badmahal)

## badmahal
##   0   1 
## 261   5

Example: Other Outliers

To get the other outlier statistics, we have to use the regression model we wish to test. - We will use the lm() function with our regression formula.
Y ~ X + X: Y is approximated by X plus X.
So we will predict depression scores (CESD) with meaning, drugs, and alcohol.

model1 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
             data = nomiss)

Example: Leverage

Definition - influence of that data point on the slope
Each score is the change in slope if you exclude that data point
How do we calculate how much change is bad?
- $\frac{2K+2}{N}$
- K is the number of predictors
- N is the sample size

Example: Leverage

k <- 3 ##number of IVs
leverage <- hatvalues(model1)
cutleverage <- (2*k+2) / nrow(nomiss)
badleverage <- as.numeric(leverage > cutleverage)
table(badleverage)

## badleverage
##   0   1 
## 247  19

Example: Cook’s Distance

Influence (Cook’s Distance) - a measure of how much of an effect that single case has on the whole model
Often described as leverage + discrepancy
How do we calculate how much change is bad?
- $\frac{4}{N-K-1}$

cooks <- cooks.distance(model1)
cutcooks <- 4 / (nrow(nomiss) - k - 1)
badcooks <- as.numeric(cooks > cutcooks)
table(badcooks)

## badcooks
##   0   1 
## 251  15

Example: Outliers Combined

What do I do with all these numbers?
Create a total score for the number of indicators a data point has.
You can decide what rule to use, but a suggestion is 2 or more indicators is an outlier.

##add them up!
totalout <- badmahal + badleverage + badcooks
table(totalout)

## totalout
##   0   1   2   3 
## 239  17   8   2

noout <- subset(nomiss, totalout < 2)

Example: Assumptions

Now that we got rid of outliers, we need to run that model again, without the outliers.

model2 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
             data = noout)

Example: Additivity

You want X and Y to be correlated
You do not want the Xs to be highly correlated, as it causes you to lose power

summary(model2, correlation = TRUE)

## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16
## 
## Correlation of Coefficients:
##                 (Intercept) PIL_total AUDIT_TOTAL_NEW
## PIL_total       -0.99                                
## AUDIT_TOTAL_NEW -0.16        0.06                    
## DAST_TOTAL_NEW  -0.17        0.15     -0.47

Example: Assumption Set Up

We use the same code as before, but without the fake regression.

standardized <- rstudent(model2)
fitted <- scale(model2$fitted.values)

Example: Linearity

{qqnorm(standardized)
abline(0,1)}

Example: Normality

hist(standardized)

Example: Homogeneity & Homoscedasticity

{plot(fitted, standardized)
abline(0,0)
abline(v = 0)}

Example: Assumption Alternatives

If your assumptions go wrong:
- Linearity - try nonlinear regression or nonparametric regression
- Normality - more subjects, still fairly robust
- Homogeneity/Homoscedasticity - bootstrapping

Example: Overall Model

Is the overall model significant? Yes!

summary(model2)

## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16

library(papaja)

## Loading required package: tinylabels

apa_style <- apa_print(model2)
apa_style$full_result$modelfit

## $r2
## [1] "$R^2 = .32$, 90\\% CI $[0.23, 0.39]$, $F(3, 252) = 38.76$, $p < .001$"

$R^2 = .32$, 90% CI $[0.23, 0.39]$, $F(3, 252) = 38.76$, $p < .001$

Example: Predictors

summary(model2)

## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16

Example: Predictors

apa_style$full_result$PIL_total

## [1] "$b = -0.37$, 95\\% CI $[-0.44, -0.30]$, $t(252) = -10.27$, $p < .001$"

apa_style$full_result$AUDIT_TOTAL_NEW

## [1] "$b = -0.08$, 95\\% CI $[-0.27, 0.11]$, $t(252) = -0.81$, $p = .416$"

apa_style$full_result$DAST_TOTAL_NEW

## [1] "$b = 0.73$, 95\\% CI $[-0.28, 1.73]$, $t(252) = 1.43$, $p = .155$"

Meaning: $b = -0.37$, 95% CI $[-0.44, -0.30]$, $t(252) = -10.27$, $p < .001$
Alcohol: $b = -0.08$, 95% CI $[-0.27, 0.11]$, $t(252) = -0.81$, $p = .416$
Drugs: $b = 0.73$, 95% CI $[-0.28, 1.73]$, $t(252) = 1.43$, $p = .155$

Example: Predictors

Two concerns:
- What if I wanted to use beta because these are very different scales?
- What about an effect size for each individual predictor?

Example: Beta

You can use the QuantPsyc package for $\beta$ values.

library(QuantPsyc)
lm.beta(model2)

##       PIL_total AUDIT_TOTAL_NEW  DAST_TOTAL_NEW 
##     -0.54695645     -0.04844095      0.08573100

Example: Effect Size

Multiple Correlation (R): - The correlation between observed Y and predicted Y values - Ranges from 0 to 1; indicates overall model strength - In simple regression: $R = |r_{XY}|$ (absolute correlation)

$R^2$ (Coefficient of Determination): - Proportion of variance in Y explained by the model - Formula: $R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST}$ - Interpretation: “The model explains $R^2 \times 100$ percent of the variance in Y” - All overlap in Y, used for overall model - $R^2 = \frac{A+B+C}{A+B+C+D}$ (explained / total)

Example: Effect Size - Semipartial Correlation

Semipartial Correlation Squared ($sr^2$): - Unique contribution of this IV to $R^2$ (variance in Y explained only by this predictor, after accounting for other predictors) - The increase in $R^2$ when this X is added to the model - Formula: $sr^2 = \frac{A}{A+B+C+D}$ (unique variance / total variance) - Interpretation: “Adding this predictor increases $R^2$ by $sr^2 \times 100$ percentage points” - Used in hierarchical regression to show incremental value of each predictor

Example: Effect Size - Partial Correlation

Partial Correlation Squared ($pr^2$): - Proportion of remaining variance in Y (after removing other predictors’ influence) that is explained by this X - Formula: $pr^2 = \frac{A}{A+D}$ (unique variance / variance not explained by others) - Interpretation: “Among the variance in Y not explained by other predictors, this predictor explains $pr^2 \times 100$ percent” - Always larger than $sr^2$ because denominator excludes shared variance: $pr^2 > sr^2$ - Often reported alongside $sr^2$ for complete picture of predictor importance

Example: Partials

We would add these to our other reports:
- Meaning: $b = -0.37$, 95% CI $[-0.44, -0.30]$, $t(252) = -10.27$, $p < .001$, $pr^2 = .30$
- Alcohol: $b = -0.08$, 95% CI $[-0.27, 0.11]$, $t(252) = -0.81$, $p = .416$, $pr^2 < .01$
- Drugs: $b = 0.73$, 95% CI $[-0.28, 1.73]$, $t(252) = 1.43$, $p = .155$, $pr^2 < .01$

library(ppcor)
partials <- pcor(noout)
partials$estimate^2

##                   PIL_total  CESD_total AUDIT_TOTAL_NEW DAST_TOTAL_NEW
## PIL_total       1.000000000 0.295101597     0.005899378    0.005606820
## CESD_total      0.295101597 1.000000000     0.002623799    0.008022779
## AUDIT_TOTAL_NEW 0.005899378 0.002623799     1.000000000    0.218315640
## DAST_TOTAL_NEW  0.005606820 0.008022779     0.218315640    1.000000000

Example: Hierarchical Regression

Known predictors (based on past research) are entered into the regression model first.
New predictors are then entered in a separate step of the model.
You can see the unique predictive influence of a new variable on the outcome because known predictors are held constant in the model.
Should be based on previous research, ideas, or other a priori decisions.
Statistical or stepwise regression also runs models as steps, but variables are included in the equation based on their mathematical properties.

Hierarchical Regression: Understand the NHST

Method: - Known predictors (based on past research) entered first - New predictors entered in separate steps - Tests significance of each step addition and individual predictors

Answers the Following Questions: - Is my overall model significant? (F-test for final model) - Is the addition of each step significant? (Comparison of $R^2$ between models via $\Delta F$) - Are the individual predictors significant? (t-tests for each coefficient)

When to Use: - Control for known/nuisance variables first before testing new predictors - See the incremental value of adding new variables to existing model - Discuss groups of variables together as a conceptual set - Based on a priori theory (NOT exploratory/stepwise selection)

Categorical Predictors

Dealing with categorical (nominal) predictors with more than two categories
Note: All types of regression can include categorical predictors, not only hierarchical regression

Dummy Coding (Contrast Coding): - Converts categorical variable into multiple binary indicators - Each dummy variable compares one category against a reference category - Advantage: Allows interpretation as comparisons (e.g., “Treatment vs. Control”) - Reference category gets value 0 on all dummies - Each non-reference category gets value 1 on its corresponding dummy - If k categories, create k-1 dummy variables - Interpretation of $b$: Difference in Y between this category and reference category

Other Coding Systems: - Deviation (effect) coding, orthogonal coding, helmert coding: https://stats.idre.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/

Dummy Coding in R

R handles dummy coding automatically: - When you use factor() to convert a variable to a categorical type - R creates k-1 dummy variables and uses first level as reference category - Each dummy variable shows 1 if observation is in that category, 0 otherwise - Example coding table:

Change Reference Category in R:

data$variable <- factor(data$variable, 
                       levels = c("ref_category", "cat2", "cat3"))

Example: Hierarchical Regression + Dummy Coding

Research Question: Do different depression treatments reduce depression ratings after controlling for family history?

Variables: - IVs: - Family history of depression (continuous predictor) - Treatment for depression (categorical: No Treatment, Placebo, Paxil, Effexor, Cheerup) - DV: Depression rating after treatment (continuous outcome)

hdata <- import("data/dummy_code.sav")
str(hdata)

## 'data.frame':    50 obs. of  3 variables:
##  $ treat        : num  0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "Treatment"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "display_width")= int 18
##   ..- attr(*, "labels")= Named num [1:5] 0 1 2 3 4
##   .. ..- attr(*, "names")= chr [1:5] "No Treatment" "Placebo" "Seroxat (Paxil)" "Effexor" ...
##  $ familyhistory: num  6.79 6.88 19.65 10.8 32.27 ...
##   ..- attr(*, "label")= chr "Family History"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ after        : num  16 18 13 15 18 16 18 19 9 16 ...
##   ..- attr(*, "label")= chr "After Treatment"
##   ..- attr(*, "format.spss")= chr "F8.0"

Example: Hierarchical Regression + Dummy Coding

Be sure to factor our categorical variable, or we will be treating categories as a continuous 1, 2, 3!

attributes(hdata$treat)

## $label
## [1] "Treatment"
## 
## $format.spss
## [1] "F8.0"
## 
## $display_width
## [1] 18
## 
## $labels
##    No Treatment         Placebo Seroxat (Paxil)         Effexor         Cheerup 
##               0               1               2               3               4

hdata$treat <- factor(hdata$treat,
                     levels = 0:4,
                     labels = c("No Treatment", "Placebo", "Paxil",
                                "Effexor", "Cheerup"))

Example: Hierarchical Regression + Dummy Coding

Data Screening: Should be done on the LAST model (skipped here for brevity)

Model 1: Base Model with Control Variable - Enter family history alone to establish baseline - Tests if family history alone predicts depression rating

- **Overall fit**: $F(1,48) = 8.50, p = .005, R^2 = .15$
- **Interpretation**: Family history accounts for 15% of variance in post-treatment depression
- **Family history predictor**: $b = 0.15, t(48) = 2.92, p = .005, pr^2 = .15$
- **Interpretation**: For each unit increase in family history, depression rating increases 0.15 units

model1 <- lm(after ~ familyhistory, data = hdata)
summary(model1)

## 
## Call:
## lm(formula = after ~ familyhistory, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5120 -1.9028 -0.2193  2.0544  6.7958 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.00363    0.84477  13.026   <2e-16 ***
## familyhistory  0.15313    0.05254   2.915   0.0054 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.133 on 48 degrees of freedom
## Multiple R-squared:  0.1504, Adjusted R-squared:  0.1327 
## F-statistic: 8.495 on 1 and 48 DF,  p-value: 0.005396

Example: Hierarchical Regression + Dummy Coding

Model 2: Full Model with Treatment Added - Add treatment category to Model 1 (keep family history to maintain control) - Tests if treatment incrementally predicts depression rating after controlling for family history - The overall model is significant, but focus on the change between models (not just overall significance) - Why? If Model 1 was already significant, the overall significance might just reflect Model 1’s contribution - We need $\Delta R^2$ and $\Delta F$ to show treatment adds predictive value beyond family history

model2 <- lm(after ~ familyhistory + treat, data = hdata)
summary(model2)

## 
## Call:
## lm(formula = after ~ familyhistory + treat, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7908 -1.6690  0.0508  1.6674  5.4108 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.98816    1.09637  12.759  < 2e-16 ***
## familyhistory  0.13513    0.05088   2.656 0.010973 *  
## treatPlacebo  -4.09905    1.21381  -3.377 0.001542 ** 
## treatPaxil    -2.03744    1.22146  -1.668 0.102411    
## treatEffexor  -2.59078    1.26984  -2.040 0.047356 *  
## treatCheerup  -4.96339    1.22489  -4.052 0.000203 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.714 on 44 degrees of freedom
## Multiple R-squared:  0.4154, Adjusted R-squared:  0.349 
## F-statistic: 6.254 on 5 and 44 DF,  p-value: 0.0001832

Example: Hierarchical Regression + Dummy Coding

Model Comparison via ANOVA: - Use anova(model1, model2) to test: Does treatment add significant predictive value? - Key statistics: - $\Delta R^2$: Increase in R² from Model 1 to Model 2 (variance explained by treatment after family history) - $\Delta F$: F-test comparing models’ fit improvements - Interpretation: The addition of the treatment set was significant: $\Delta F(4, 44) = 4.99, p = .002, \Delta R^2 = .27$ - Treatment explains an additional 27% of depression rating variance - This improvement is statistically significant (p = .002)

anova(model1, model2)

## Analysis of Variance Table
## 
## Model 1: after ~ familyhistory
## Model 2: after ~ familyhistory + treat
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     48 471.12                                
## 2     44 324.13  4    146.99 4.9883 0.002102 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example: Hierarchical Regression + Dummy Coding

Interpreting Dummy-Coded Coefficients: - Each dummy coefficient ($b$) = difference between that category and reference category - Reference category (usually first level): automatically set to 0, acts as baseline - Positive $b$: That category has higher outcome than reference - Negative $b$: That category has lower outcome than reference - $b$ = difference in means, controlling for (holding constant) other predictors

Visualizing Results with emmeans: - Raw $b$ values can be hard to interpret - Estimated Marginal Means (EMMs): Predicted mean Y for each group, given other predictors’ values - Advantage: Shows group means on original scale (not as comparisons)

Example: Hierarchical Regression + Dummy Coding

summary(model2)

## 
## Call:
## lm(formula = after ~ familyhistory + treat, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7908 -1.6690  0.0508  1.6674  5.4108 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.98816    1.09637  12.759  < 2e-16 ***
## familyhistory  0.13513    0.05088   2.656 0.010973 *  
## treatPlacebo  -4.09905    1.21381  -3.377 0.001542 ** 
## treatPaxil    -2.03744    1.22146  -1.668 0.102411    
## treatEffexor  -2.59078    1.26984  -2.040 0.047356 *  
## treatCheerup  -4.96339    1.22489  -4.052 0.000203 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.714 on 44 degrees of freedom
## Multiple R-squared:  0.4154, Adjusted R-squared:  0.349 
## F-statistic: 6.254 on 5 and 44 DF,  p-value: 0.0001832

library(emmeans)

## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'

emmeans(model2, "treat")

##  treat        emmean    SE df lower.CL upper.CL
##  No Treatment   15.8 0.858 44    14.11     17.6
##  Placebo        11.7 0.858 44    10.01     13.5
##  Paxil          13.8 0.871 44    12.04     15.6
##  Effexor        13.2 0.930 44    11.37     15.1
##  Cheerup        10.9 0.877 44     9.11     12.6
## 
## Confidence level used: 0.95

Example: Hierarchical Regression + Dummy Coding

We cannot really use the pcor code on our categorical variables. What can we do to calculate?
Formula: $\frac{t^2}{t^2 + df_t}$

model_summary <- summary(model2)
t_values <- model_summary$coefficients[ , 3] 
df_t <- model_summary$df[2]

t_values^2 / (t_values^2+df_t)

##   (Intercept) familyhistory  treatPlacebo    treatPaxil  treatEffexor 
##    0.78721534    0.13817150    0.20583673    0.05947423    0.08642805 
##  treatCheerup 
##    0.27175918

Hierarchical Regression: Power Analysis

Use the pwr library to calculate required sample size for desired power and effect size
Key concept: Convert $R^2$ to Cohen’s $f^2$ (effect size for regression, different from ANOVA F)

library(pwr)
R2 <- model_summary$r.squared
f2 <- R2 / (1-R2)

R2

## [1] 0.4154487

f2

## [1] 0.7107138

Hierarchical Regression: Power

Function Arguments: - u = degrees of freedom for the model (numerator df, first value in F-statistic) - v = degrees of freedom for error (denominator df); leave blank (NULL) when solving for sample size - f2 = Cohen’s $f^2$ (converted effect size) - sig.level = alpha level (typically .05) - power = desired statistical power (typically .80)

Final Sample Size Calculation: - Output provides v (error df) needed - Actual N = $v + k + 1$ where k = number of predictors

#f2 is cohen f squared 
pwr.f2.test(u = model_summary$df[1], 
            v = NULL, f2 = f2, 
            sig.level = .05, power = .80)

## 
##      Multiple regression power calculation 
## 
##               u = 6
##               v = 19.20439
##              f2 = 0.7107138
##       sig.level = 0.05
##           power = 0.8

Summary

In this lecture, we’ve covered:

Foundations: - Regression equation and interpretation of coefficients ($b_0$, $b_1$, $\varepsilon$) - Method of least squares for finding best-fit line - Sums of squares (SST, SSR, SSM) and their meaning

Model Evaluation: - F-test for overall model significance - $R^2$ and $R$ as effect sizes for model fit - Comparison of H0 (mean-only) vs. H1 (regression) models

Individual Predictors: - t-tests and hypothesis testing for coefficients - Unstandardized ($b$) vs. standardized ($\beta$) coefficients - Partial and semipartial correlations for relative importance

Advanced Topics: - Regression assumptions and data screening - Outlier detection (Mahalanobis, leverage, Cook’s D) - Hierarchical regression with model comparison - Categorical predictors and dummy coding - Power analysis for sample size planning

Field et al. (2012) reference: Chapter 7, Discovering Statistics Using R

Linear Regression

What is Regression?

Describing a Straight Line

Intercepts and Gradients

Types of Regression

Analyzing a Regression

Overall Model: Understand the NHST

Overall Model: What Do We Mean by “Predict”?

Comparing H0 vs H1 Models

Method of Least Squares

Sums of Squares in Regression

Individual Predictors: Understand the NHST

Individual Predictors: Understand the NHST

Individual Predictors: Understand the NHST

Individual Predictors: Standardization

Data Screening

Data Screening

Example: Mental Health

Example: Accuracy, Missing Data

Example: Outliers

Example: Mahalanobis

Example: Other Outliers

Example: Leverage

Example: Leverage

Example: Cook’s Distance

Example: Outliers Combined

Example: Assumptions

Example: Additivity

Example: Assumption Set Up

Example: Linearity

Example: Normality

Example: Homogeneity & Homoscedasticity

Example: Assumption Alternatives

Example: Overall Model

Example: Predictors

Example: Predictors

Example: Predictors

Example: Beta

Example: Effect Size

Example: Effect Size - Semipartial Correlation

Example: Effect Size - Partial Correlation

Example: Partials

Example: Hierarchical Regression

Hierarchical Regression: Understand the NHST

Categorical Predictors

Dummy Coding in R

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Hierarchical Regression: Power Analysis

Hierarchical Regression: Power

Hierarchical Regression: Power

Summary