Linear Regression

Ziyuan Huang

Last Updated: 2026-01-29

What is Regression?

Describing a Straight Line

\[Y_i = b_0 + b_1X_i + \varepsilon_i\]

Key Components:

All regression coefficients (\(b\)) are sometimes called: - Gradient (slope) of the regression line - Direction/Strength of Relationship - Unstandardized coefficients (original scale)

Intercepts and Gradients

Types of Regression

Analyzing a Regression

Overall Model: Understand the NHST

Overall Model: What Do We Mean by “Predict”?

Comparing H0 vs H1 Models

Method of Least Squares

Sums of Squares in Regression

Total Sum of Squares (SST): - Sum of squared differences between observed values and the mean of Y - Represents total variability in the outcome with no predictors (H0 model) - Formula: \(SST = \sum(Y_i - \bar{Y})^2\)

Residual Sum of Squares (SSR): - Sum of squared differences between observed values and predicted values from regression line - Represents variability unexplained by the model (error remaining) - Formula: \(SSR = \sum(Y_i - \hat{Y_i})^2\)

Model Sum of Squares (SSM): - Sum of squared differences between predicted values and the mean of Y - Represents variability explained by the model (improvement from adding predictors) - Formula: \(SSM = SST - SSR\) or \(SSM = \sum(\hat{Y_i} - \bar{Y})^2\)

Relationship: \(SST = SSM + SSR\) (total variation = explained + unexplained)

Individual Predictors: Understand the NHST

Individual Predictors: Understand the NHST

Individual Predictors: Understand the NHST

Individual Predictors: Standardization

Unstandardized Regression Coefficient (\(b\)): - Regression coefficient in the original scale of the variables - Interpretation: For every one unit increase in X, Y increases by \(b\) units - Advantage: More interpretable for your specific problem context - Disadvantage: Hard to compare across predictors with different scales - Formula: Predicted \(Y = b_0 + b_1X\)

Standardized Regression Coefficient (\(\beta\) or “beta”): - Regression coefficient after converting variables to standard deviation units - Interpretation: For every one SD increase in X, Y increases by \(\beta\) SDs - Advantage: Comparable across predictors with different scales; indicates relative importance - Disadvantage: Less interpretable for real-world problem context

When to Use Each: - Use \(b\) for answering applied questions (“How many more sales per $1000 advertising?”) - Use \(\beta\) for comparing predictor importance across variables with different scales

Data Screening

Data Screening

Example: Mental Health

library(rio)
master <- import("data/regression_data.sav")
master <- master[ , c(8:11)]
str(master)
## 'data.frame':    267 obs. of  4 variables:
##  $ PIL_total      : num  121 76 98 122 99 134 102 124 126 112 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 11
##  $ CESD_total     : num  28 37 20 15 7 7 27 10 9 8 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 12
##  $ AUDIT_TOTAL_NEW: num  1 5 3 3 2 3 2 1 1 7 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 17
##  $ DAST_TOTAL_NEW : num  0 0 0 1 0 0 1 0 0 1 ...
##   ..- attr(*, "format.spss")= chr "F8.2"
##   ..- attr(*, "display_width")= int 16

Example: Accuracy, Missing Data

summary(master)
##    PIL_total       CESD_total   AUDIT_TOTAL_NEW  DAST_TOTAL_NEW 
##  Min.   : 60.0   Min.   : 0.0   Min.   : 0.000   Min.   :0.000  
##  1st Qu.:103.0   1st Qu.: 7.0   1st Qu.: 2.000   1st Qu.:0.000  
##  Median :111.0   Median :11.0   Median : 5.000   Median :0.000  
##  Mean   :110.7   Mean   :13.2   Mean   : 6.807   Mean   :0.906  
##  3rd Qu.:121.0   3rd Qu.:17.0   3rd Qu.:11.000   3rd Qu.:1.000  
##  Max.   :138.0   Max.   :47.0   Max.   :31.000   Max.   :9.000  
##                                                  NA's   :1
nomiss <- na.omit(master)
nrow(master)
## [1] 267
nrow(nomiss)
## [1] 266

Example: Outliers

Example: Mahalanobis

mahal <- mahalanobis(nomiss, 
                    colMeans(nomiss), 
                    cov(nomiss))
cutmahal <- qchisq(1-.001, ncol(nomiss))
badmahal <- as.numeric(mahal > cutmahal) ##note the direction of the > 
table(badmahal)
## badmahal
##   0   1 
## 261   5

Example: Other Outliers

model1 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
             data = nomiss)

Example: Leverage

Example: Leverage

k <- 3 ##number of IVs
leverage <- hatvalues(model1)
cutleverage <- (2*k+2) / nrow(nomiss)
badleverage <- as.numeric(leverage > cutleverage)
table(badleverage)
## badleverage
##   0   1 
## 247  19

Example: Cook’s Distance

cooks <- cooks.distance(model1)
cutcooks <- 4 / (nrow(nomiss) - k - 1)
badcooks <- as.numeric(cooks > cutcooks)
table(badcooks)
## badcooks
##   0   1 
## 251  15

Example: Outliers Combined

##add them up!
totalout <- badmahal + badleverage + badcooks
table(totalout)
## totalout
##   0   1   2   3 
## 239  17   8   2
noout <- subset(nomiss, totalout < 2)

Example: Assumptions

model2 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
             data = noout)

Example: Additivity

summary(model2, correlation = TRUE)
## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16
## 
## Correlation of Coefficients:
##                 (Intercept) PIL_total AUDIT_TOTAL_NEW
## PIL_total       -0.99                                
## AUDIT_TOTAL_NEW -0.16        0.06                    
## DAST_TOTAL_NEW  -0.17        0.15     -0.47

Example: Assumption Set Up

standardized <- rstudent(model2)
fitted <- scale(model2$fitted.values)

Example: Linearity

{qqnorm(standardized)
abline(0,1)}

Example: Normality

hist(standardized)

Example: Homogeneity & Homoscedasticity

{plot(fitted, standardized)
abline(0,0)
abline(v = 0)}

Example: Assumption Alternatives

Example: Overall Model

summary(model2)
## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16
library(papaja)
## Loading required package: tinylabels
apa_style <- apa_print(model2)
apa_style$full_result$modelfit
## $r2
## [1] "$R^2 = .32$, 90\\% CI $[0.23, 0.39]$, $F(3, 252) = 38.76$, $p < .001$"

Example: Predictors

summary(model2)
## 
## Call:
## lm(formula = CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, 
##     data = noout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.904  -5.086  -1.161   3.405  29.342 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     54.19317    4.20489  12.888   <2e-16 ***
## PIL_total       -0.37272    0.03629 -10.271   <2e-16 ***
## AUDIT_TOTAL_NEW -0.07774    0.09548  -0.814    0.416    
## DAST_TOTAL_NEW   0.72741    0.50953   1.428    0.155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.357 on 252 degrees of freedom
## Multiple R-squared:  0.3157, Adjusted R-squared:  0.3076 
## F-statistic: 38.76 on 3 and 252 DF,  p-value: < 2.2e-16

Example: Predictors

apa_style$full_result$PIL_total
## [1] "$b = -0.37$, 95\\% CI $[-0.44, -0.30]$, $t(252) = -10.27$, $p < .001$"
apa_style$full_result$AUDIT_TOTAL_NEW
## [1] "$b = -0.08$, 95\\% CI $[-0.27, 0.11]$, $t(252) = -0.81$, $p = .416$"
apa_style$full_result$DAST_TOTAL_NEW
## [1] "$b = 0.73$, 95\\% CI $[-0.28, 1.73]$, $t(252) = 1.43$, $p = .155$"

Example: Predictors

Example: Beta

library(QuantPsyc)
lm.beta(model2)
##       PIL_total AUDIT_TOTAL_NEW  DAST_TOTAL_NEW 
##     -0.54695645     -0.04844095      0.08573100

Example: Effect Size

Multiple Correlation (R): - The correlation between observed Y and predicted Y values - Ranges from 0 to 1; indicates overall model strength - In simple regression: \(R = |r_{XY}|\) (absolute correlation)

\(R^2\) (Coefficient of Determination): - Proportion of variance in Y explained by the model - Formula: \(R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST}\) - Interpretation: “The model explains \(R^2 \times 100\) percent of the variance in Y” - All overlap in Y, used for overall model - \(R^2 = \frac{A+B+C}{A+B+C+D}\) (explained / total)

Example: Effect Size - Semipartial Correlation

Semipartial Correlation Squared (\(sr^2\)): - Unique contribution of this IV to \(R^2\) (variance in Y explained only by this predictor, after accounting for other predictors) - The increase in \(R^2\) when this X is added to the model - Formula: \(sr^2 = \frac{A}{A+B+C+D}\) (unique variance / total variance) - Interpretation: “Adding this predictor increases \(R^2\) by \(sr^2 \times 100\) percentage points” - Used in hierarchical regression to show incremental value of each predictor

Example: Effect Size - Partial Correlation

Partial Correlation Squared (\(pr^2\)): - Proportion of remaining variance in Y (after removing other predictors’ influence) that is explained by this X - Formula: \(pr^2 = \frac{A}{A+D}\) (unique variance / variance not explained by others) - Interpretation: “Among the variance in Y not explained by other predictors, this predictor explains \(pr^2 \times 100\) percent” - Always larger than \(sr^2\) because denominator excludes shared variance: \(pr^2 > sr^2\) - Often reported alongside \(sr^2\) for complete picture of predictor importance

Example: Partials

library(ppcor)
partials <- pcor(noout)
partials$estimate^2
##                   PIL_total  CESD_total AUDIT_TOTAL_NEW DAST_TOTAL_NEW
## PIL_total       1.000000000 0.295101597     0.005899378    0.005606820
## CESD_total      0.295101597 1.000000000     0.002623799    0.008022779
## AUDIT_TOTAL_NEW 0.005899378 0.002623799     1.000000000    0.218315640
## DAST_TOTAL_NEW  0.005606820 0.008022779     0.218315640    1.000000000

Example: Hierarchical Regression

Hierarchical Regression: Understand the NHST

Method: - Known predictors (based on past research) entered first - New predictors entered in separate steps - Tests significance of each step addition and individual predictors

Answers the Following Questions: - Is my overall model significant? (F-test for final model) - Is the addition of each step significant? (Comparison of \(R^2\) between models via \(\Delta F\)) - Are the individual predictors significant? (t-tests for each coefficient)

When to Use: - Control for known/nuisance variables first before testing new predictors - See the incremental value of adding new variables to existing model - Discuss groups of variables together as a conceptual set - Based on a priori theory (NOT exploratory/stepwise selection)

Categorical Predictors

Dummy Coding (Contrast Coding): - Converts categorical variable into multiple binary indicators - Each dummy variable compares one category against a reference category - Advantage: Allows interpretation as comparisons (e.g., “Treatment vs. Control”) - Reference category gets value 0 on all dummies - Each non-reference category gets value 1 on its corresponding dummy - If k categories, create k-1 dummy variables - Interpretation of \(b\): Difference in Y between this category and reference category

Other Coding Systems: - Deviation (effect) coding, orthogonal coding, helmert coding: https://stats.idre.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/

Dummy Coding in R

R handles dummy coding automatically: - When you use factor() to convert a variable to a categorical type - R creates k-1 dummy variables and uses first level as reference category - Each dummy variable shows 1 if observation is in that category, 0 otherwise - Example coding table:

Change Reference Category in R:

data$variable <- factor(data$variable, 
                       levels = c("ref_category", "cat2", "cat3"))

Example: Hierarchical Regression + Dummy Coding

Research Question: Do different depression treatments reduce depression ratings after controlling for family history?

Variables: - IVs: - Family history of depression (continuous predictor) - Treatment for depression (categorical: No Treatment, Placebo, Paxil, Effexor, Cheerup) - DV: Depression rating after treatment (continuous outcome)

hdata <- import("data/dummy_code.sav")
str(hdata)
## 'data.frame':    50 obs. of  3 variables:
##  $ treat        : num  0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "Treatment"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "display_width")= int 18
##   ..- attr(*, "labels")= Named num [1:5] 0 1 2 3 4
##   .. ..- attr(*, "names")= chr [1:5] "No Treatment" "Placebo" "Seroxat (Paxil)" "Effexor" ...
##  $ familyhistory: num  6.79 6.88 19.65 10.8 32.27 ...
##   ..- attr(*, "label")= chr "Family History"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ after        : num  16 18 13 15 18 16 18 19 9 16 ...
##   ..- attr(*, "label")= chr "After Treatment"
##   ..- attr(*, "format.spss")= chr "F8.0"

Example: Hierarchical Regression + Dummy Coding

attributes(hdata$treat)
## $label
## [1] "Treatment"
## 
## $format.spss
## [1] "F8.0"
## 
## $display_width
## [1] 18
## 
## $labels
##    No Treatment         Placebo Seroxat (Paxil)         Effexor         Cheerup 
##               0               1               2               3               4
hdata$treat <- factor(hdata$treat,
                     levels = 0:4,
                     labels = c("No Treatment", "Placebo", "Paxil",
                                "Effexor", "Cheerup"))

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Data Screening: Should be done on the LAST model (skipped here for brevity)

Model 1: Base Model with Control Variable - Enter family history alone to establish baseline - Tests if family history alone predicts depression rating

- **Overall fit**: $F(1,48) = 8.50, p = .005, R^2 = .15$
- **Interpretation**: Family history accounts for 15% of variance in post-treatment depression
- **Family history predictor**: $b = 0.15, t(48) = 2.92, p = .005, pr^2 = .15$
- **Interpretation**: For each unit increase in family history, depression rating increases 0.15 units
model1 <- lm(after ~ familyhistory, data = hdata)
summary(model1)
## 
## Call:
## lm(formula = after ~ familyhistory, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5120 -1.9028 -0.2193  2.0544  6.7958 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.00363    0.84477  13.026   <2e-16 ***
## familyhistory  0.15313    0.05254   2.915   0.0054 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.133 on 48 degrees of freedom
## Multiple R-squared:  0.1504, Adjusted R-squared:  0.1327 
## F-statistic: 8.495 on 1 and 48 DF,  p-value: 0.005396

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Model 2: Full Model with Treatment Added - Add treatment category to Model 1 (keep family history to maintain control) - Tests if treatment incrementally predicts depression rating after controlling for family history - The overall model is significant, but focus on the change between models (not just overall significance) - Why? If Model 1 was already significant, the overall significance might just reflect Model 1’s contribution - We need \(\Delta R^2\) and \(\Delta F\) to show treatment adds predictive value beyond family history

model2 <- lm(after ~ familyhistory + treat, data = hdata)
summary(model2)
## 
## Call:
## lm(formula = after ~ familyhistory + treat, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7908 -1.6690  0.0508  1.6674  5.4108 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.98816    1.09637  12.759  < 2e-16 ***
## familyhistory  0.13513    0.05088   2.656 0.010973 *  
## treatPlacebo  -4.09905    1.21381  -3.377 0.001542 ** 
## treatPaxil    -2.03744    1.22146  -1.668 0.102411    
## treatEffexor  -2.59078    1.26984  -2.040 0.047356 *  
## treatCheerup  -4.96339    1.22489  -4.052 0.000203 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.714 on 44 degrees of freedom
## Multiple R-squared:  0.4154, Adjusted R-squared:  0.349 
## F-statistic: 6.254 on 5 and 44 DF,  p-value: 0.0001832

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Model Comparison via ANOVA: - Use anova(model1, model2) to test: Does treatment add significant predictive value? - Key statistics: - \(\Delta R^2\): Increase in R² from Model 1 to Model 2 (variance explained by treatment after family history) - \(\Delta F\): F-test comparing models’ fit improvements - Interpretation: The addition of the treatment set was significant: \(\Delta F(4, 44) = 4.99, p = .002, \Delta R^2 = .27\) - Treatment explains an additional 27% of depression rating variance - This improvement is statistically significant (p = .002)

anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: after ~ familyhistory
## Model 2: after ~ familyhistory + treat
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     48 471.12                                
## 2     44 324.13  4    146.99 4.9883 0.002102 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example: Hierarchical Regression + Dummy Coding

Example: Hierarchical Regression + Dummy Coding

Interpreting Dummy-Coded Coefficients: - Each dummy coefficient (\(b\)) = difference between that category and reference category - Reference category (usually first level): automatically set to 0, acts as baseline - Positive \(b\): That category has higher outcome than reference - Negative \(b\): That category has lower outcome than reference - \(b\) = difference in means, controlling for (holding constant) other predictors

Visualizing Results with emmeans: - Raw \(b\) values can be hard to interpret - Estimated Marginal Means (EMMs): Predicted mean Y for each group, given other predictors’ values - Advantage: Shows group means on original scale (not as comparisons)

Example: Hierarchical Regression + Dummy Coding

summary(model2)
## 
## Call:
## lm(formula = after ~ familyhistory + treat, data = hdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7908 -1.6690  0.0508  1.6674  5.4108 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.98816    1.09637  12.759  < 2e-16 ***
## familyhistory  0.13513    0.05088   2.656 0.010973 *  
## treatPlacebo  -4.09905    1.21381  -3.377 0.001542 ** 
## treatPaxil    -2.03744    1.22146  -1.668 0.102411    
## treatEffexor  -2.59078    1.26984  -2.040 0.047356 *  
## treatCheerup  -4.96339    1.22489  -4.052 0.000203 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.714 on 44 degrees of freedom
## Multiple R-squared:  0.4154, Adjusted R-squared:  0.349 
## F-statistic: 6.254 on 5 and 44 DF,  p-value: 0.0001832
library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
emmeans(model2, "treat")
##  treat        emmean    SE df lower.CL upper.CL
##  No Treatment   15.8 0.858 44    14.11     17.6
##  Placebo        11.7 0.858 44    10.01     13.5
##  Paxil          13.8 0.871 44    12.04     15.6
##  Effexor        13.2 0.930 44    11.37     15.1
##  Cheerup        10.9 0.877 44     9.11     12.6
## 
## Confidence level used: 0.95

Example: Hierarchical Regression + Dummy Coding

model_summary <- summary(model2)
t_values <- model_summary$coefficients[ , 3] 
df_t <- model_summary$df[2]

t_values^2 / (t_values^2+df_t)
##   (Intercept) familyhistory  treatPlacebo    treatPaxil  treatEffexor 
##    0.78721534    0.13817150    0.20583673    0.05947423    0.08642805 
##  treatCheerup 
##    0.27175918

Hierarchical Regression: Power Analysis

library(pwr)
R2 <- model_summary$r.squared
f2 <- R2 / (1-R2)

R2
## [1] 0.4154487
f2
## [1] 0.7107138

Hierarchical Regression: Power

Hierarchical Regression: Power

Function Arguments: - u = degrees of freedom for the model (numerator df, first value in F-statistic) - v = degrees of freedom for error (denominator df); leave blank (NULL) when solving for sample size - f2 = Cohen’s \(f^2\) (converted effect size) - sig.level = alpha level (typically .05) - power = desired statistical power (typically .80)

Final Sample Size Calculation: - Output provides v (error df) needed - Actual N = \(v + k + 1\) where k = number of predictors

#f2 is cohen f squared 
pwr.f2.test(u = model_summary$df[1], 
            v = NULL, f2 = f2, 
            sig.level = .05, power = .80)
## 
##      Multiple regression power calculation 
## 
##               u = 6
##               v = 19.20439
##              f2 = 0.7107138
##       sig.level = 0.05
##           power = 0.8

Summary

In this lecture, we’ve covered:

Foundations: - Regression equation and interpretation of coefficients (\(b_0\), \(b_1\), \(\varepsilon\)) - Method of least squares for finding best-fit line - Sums of squares (SST, SSR, SSM) and their meaning

Model Evaluation: - F-test for overall model significance - \(R^2\) and \(R\) as effect sizes for model fit - Comparison of H0 (mean-only) vs. H1 (regression) models

Individual Predictors: - t-tests and hypothesis testing for coefficients - Unstandardized (\(b\)) vs. standardized (\(\beta\)) coefficients - Partial and semipartial correlations for relative importance

Advanced Topics: - Regression assumptions and data screening - Outlier detection (Mahalanobis, leverage, Cook’s D) - Hierarchical regression with model comparison - Categorical predictors and dummy coding - Power analysis for sample size planning

Field et al. (2012) reference: Chapter 7, Discovering Statistics Using R