Erin M. Buchanan
Last Updated: 2026-01-08
In the last lecture, we discussed:
What else should we consider for checking our data?
For parametric statistics, we should think about:
The procedure:
If you have several variables then their combined effect is best described by adding their effects together.
In plain English: Think of additivity like ingredients in a recipe. If you’re predicting cake quality from flour and sugar, each ingredient should contribute its own unique effect. If flour and sugar were essentially measuring the same thing (like two different brands of the same ingredient), you’d be counting the same effect twice [2].
If two variables are not additive, this implies that the variables are too related, which reduces power.
This analysis is necessary when you have multiple continuous variables. If you only have one dependent variable, then you cannot run this check.
##
## iter imp variable
## 1 1 RS3 RS6 RS8 RS11 RS13 RS14
## 1 2 RS3 RS6 RS8 RS11 RS13 RS14
## 1 3 RS3 RS6 RS8 RS11 RS13 RS14
## 1 4 RS3 RS6 RS8 RS11 RS13 RS14
## 1 5 RS3 RS6 RS8 RS11 RS13 RS14
## 2 1 RS3 RS6 RS8 RS11 RS13 RS14
## 2 2 RS3 RS6 RS8 RS11 RS13 RS14
## 2 3 RS3 RS6 RS8 RS11 RS13 RS14
## 2 4 RS3 RS6 RS8 RS11 RS13 RS14
## 2 5 RS3 RS6 RS8 RS11 RS13 RS14
## 3 1 RS3 RS6 RS8 RS11 RS13 RS14
## 3 2 RS3 RS6 RS8 RS11 RS13 RS14
## 3 3 RS3 RS6 RS8 RS11 RS13 RS14
## 3 4 RS3 RS6 RS8 RS11 RS13 RS14
## 3 5 RS3 RS6 RS8 RS11 RS13 RS14
## 4 1 RS3 RS6 RS8 RS11 RS13 RS14
## 4 2 RS3 RS6 RS8 RS11 RS13 RS14
## 4 3 RS3 RS6 RS8 RS11 RS13 RS14
## 4 4 RS3 RS6 RS8 RS11 RS13 RS14
## 4 5 RS3 RS6 RS8 RS11 RS13 RS14
## 5 1 RS3 RS6 RS8 RS11 RS13 RS14
## 5 2 RS3 RS6 RS8 RS11 RS13 RS14
## 5 3 RS3 RS6 RS8 RS11 RS13 RS14
## 5 4 RS3 RS6 RS8 RS11 RS13 RS14
## 5 5 RS3 RS6 RS8 RS11 RS13 RS14
## 'data.frame': 118 obs. of 20 variables:
## $ Sex : Factor w/ 2 levels "Women","Men": 1 1 1 1 1 1 1 1 1 1 ...
## $ Age : int 17 16 15 16 15 14 13 15 16 17 ...
## $ Grade : int 11 7 6 11 7 6 4 6 8 11 ...
## $ SES : Factor w/ 3 levels "Low","Medium",..: 2 3 3 2 3 3 1 2 2 3 ...
## $ Absences: int 2 2 2 6 2 2 2 2 1 6 ...
## $ RS1 : int 6 7 5 7 7 2 6 4 3 4 ...
## $ RS2 : int 4 1 5 4 7 3 6 6 5 3 ...
## $ RS3 : int 2 1 5 4 7 2 7 6 3 4 ...
## $ RS4 : int 2 5 7 7 7 3 6 6 2 4 ...
## $ RS5 : int 4 7 5 4 7 2 1 6 1 4 ...
## $ RS6 : int 7 7 6 4 7 3 4 6 7 4 ...
## $ RS7 : int 7 7 5 7 7 3 6 6 4 4 ...
## $ RS8 : int 4 7 7 7 7 3 6 6 3 7 ...
## $ RS9 : int 5 4 6 4 7 2 2 6 3 4 ...
## $ RS10 : int 7 7 7 7 7 2 5 6 3 4 ...
## $ RS11 : int 4 1 6 7 7 3 6 6 3 6 ...
## $ RS12 : int 7 7 6 4 7 3 6 6 2 5 ...
## $ RS13 : int 4 4 6 7 7 2 6 6 5 6 ...
## $ RS14 : int 7 7 6 4 7 3 2 6 3 4 ...
## $ Health : int 6 6 2 6 4 6 1 2 3 1 ...
## Age Grade Absences RS1 RS2
## Age 1.00000000 0.49129879 0.259278455 0.05557066 -0.02500303
## Grade 0.49129879 1.00000000 -0.106825686 0.13576618 0.04492324
## Absences 0.25927845 -0.10682569 1.000000000 0.08062971 0.12545466
## RS1 0.05557066 0.13576618 0.080629712 1.00000000 0.30387778
## RS2 -0.02500303 0.04492324 0.125454661 0.30387778 1.00000000
## RS3 0.12731787 0.11110507 0.214147935 0.39435773 0.60392742
## RS4 -0.09530848 -0.01762404 0.095790980 0.33308664 0.41917969
## RS5 0.16312777 0.10613350 0.089601326 0.31633125 0.32639523
## RS6 0.15873088 0.21894625 0.006016586 0.28979227 0.51680621
## RS7 0.24685001 0.24172671 0.166350702 0.49315246 0.57762184
## RS8 0.16732512 0.19607386 0.083607390 0.30270485 0.41999692
## RS9 0.02368352 0.01918898 -0.110881821 0.32269858 0.48383577
## RS10 0.05703377 0.19189629 0.122778324 0.42665728 0.62064605
## RS11 0.03964118 0.08136441 0.191811833 0.32656293 0.59752445
## RS12 0.18399663 0.07424952 0.004201959 0.32427895 0.27635326
## RS13 0.10128837 0.10547175 0.140220973 0.30619685 0.62718380
## RS14 0.18292725 0.17612502 -0.081311740 0.22975797 0.33264694
## Health -0.16644080 -0.12436242 0.023720562 -0.01407696 -0.13978793
## RS3 RS4 RS5 RS6 RS7
## Age 0.1273179 -0.09530848 0.1631277685 0.158730882 0.2468500
## Grade 0.1111051 -0.01762404 0.1061335021 0.218946250 0.2417267
## Absences 0.2141479 0.09579098 0.0896013265 0.006016586 0.1663507
## RS1 0.3943577 0.33308664 0.3163312549 0.289792266 0.4931525
## RS2 0.6039274 0.41917969 0.3263952295 0.516806207 0.5776218
## RS3 1.0000000 0.53124323 0.4308641895 0.428769218 0.4680652
## RS4 0.5312432 1.00000000 0.3187108026 0.219281403 0.3019374
## RS5 0.4308642 0.31871080 1.0000000000 0.445824875 0.3880713
## RS6 0.4287692 0.21928140 0.4458248751 1.000000000 0.6038055
## RS7 0.4680652 0.30193741 0.3880712508 0.603805512 1.0000000
## RS8 0.4476441 0.21250714 0.4131928044 0.437741777 0.4925186
## RS9 0.3245957 0.34644102 0.4529777465 0.567339117 0.4441627
## RS10 0.4327729 0.45494068 0.4445222275 0.504325912 0.6034629
## RS11 0.5457643 0.39694108 0.5084182361 0.581456379 0.5317407
## RS12 0.3255026 0.10577219 0.5219541822 0.440801532 0.4183208
## RS13 0.6120622 0.38919009 0.4178701180 0.623020688 0.5549517
## RS14 0.3830965 0.19221471 0.5284791103 0.535873491 0.4631298
## Health -0.1821844 -0.03851484 0.0009663785 -0.066956532 -0.1091089
## RS8 RS9 RS10 RS11 RS12
## Age 0.167325123 0.02368352 0.05703377 0.03964118 0.183996626
## Grade 0.196073861 0.01918898 0.19189629 0.08136441 0.074249525
## Absences 0.083607390 -0.11088182 0.12277832 0.19181183 0.004201959
## RS1 0.302704849 0.32269858 0.42665728 0.32656293 0.324278946
## RS2 0.419996923 0.48383577 0.62064605 0.59752445 0.276353262
## RS3 0.447644056 0.32459572 0.43277287 0.54576426 0.325502575
## RS4 0.212507139 0.34644102 0.45494068 0.39694108 0.105772190
## RS5 0.413192804 0.45297775 0.44452223 0.50841824 0.521954182
## RS6 0.437741777 0.56733912 0.50432591 0.58145638 0.440801532
## RS7 0.492518639 0.44416274 0.60346287 0.53174073 0.418320781
## RS8 1.000000000 0.45762845 0.47063305 0.44848862 0.486307902
## RS9 0.457628455 1.00000000 0.62362701 0.60201244 0.486867126
## RS10 0.470633053 0.62362701 1.00000000 0.60685642 0.420750948
## RS11 0.448488621 0.60201244 0.60685642 1.00000000 0.467235410
## RS12 0.486307902 0.48686713 0.42075095 0.46723541 1.000000000
## RS13 0.577940692 0.55246064 0.56010960 0.73465105 0.562519549
## RS14 0.400478077 0.55362235 0.48853773 0.36588780 0.529659279
## Health 0.005123918 0.02091008 -0.05983438 -0.07828823 -0.021540432
## RS13 RS14 Health
## Age 0.10128837 0.18292725 -0.1664407976
## Grade 0.10547175 0.17612502 -0.1243624181
## Absences 0.14022097 -0.08131174 0.0237205616
## RS1 0.30619685 0.22975797 -0.0140769622
## RS2 0.62718380 0.33264694 -0.1397879302
## RS3 0.61206222 0.38309648 -0.1821843842
## RS4 0.38919009 0.19221471 -0.0385148397
## RS5 0.41787012 0.52847911 0.0009663785
## RS6 0.62302069 0.53587349 -0.0669565316
## RS7 0.55495169 0.46312976 -0.1091089078
## RS8 0.57794069 0.40047808 0.0051239184
## RS9 0.55246064 0.55362235 0.0209100770
## RS10 0.56010960 0.48853773 -0.0598343773
## RS11 0.73465105 0.36588780 -0.0782882278
## RS12 0.56251955 0.52965928 -0.0215404321
## RS13 1.00000000 0.52257091 -0.0911749490
## RS14 0.52257091 1.00000000 -0.0923315613
## Health -0.09117495 -0.09233156 1.0000000000
This assumption tends to get incorrectly translated as your data need to be normally distributed.
The actual assumption is that the sampling distribution is normally distributed [6].
In plain English: We don’t need every individual score to be perfectly normal. What matters is that if we took many samples and calculated their means, those means would form a normal distribution [6].
Remember the Central Limit Theorem - at what point is the sample size large enough to assume normality?
Check out the sample distribution of residuals as an approximation for multivariate normality.
## Age Grade Absences RS1 RS2 RS3
## -0.20880151 0.14109617 0.50473399 -0.64442864 -0.39612649 0.07144744
## RS4 RS5 RS6 RS7 RS8 RS9
## -0.34114107 -0.07196785 -0.36542224 -0.70320051 -0.22125349 -0.29978265
## RS10 RS11 RS12 RS13 RS14 Health
## -0.63408916 -0.25701748 -1.09556422 -0.56047487 -0.44083590 0.16354123
## Age Grade Absences RS1 RS2 RS3 RS4
## -1.2139538 -0.6513750 -1.0873422 -0.4179216 -0.7539534 -1.0075623 -1.0546091
## RS5 RS6 RS7 RS8 RS9 RS10 RS11
## -1.1301711 -0.9259363 -0.3086471 -1.2802140 -0.7513962 -0.6078609 -1.0478794
## RS12 RS13 RS14 Health
## 0.3950031 -0.4969303 -0.8046555 -1.2753754
## [1] 117
Assumption that the variances of the variables are roughly equal [7].
In plain English: Imagine comparing test scores from three classes. Homogeneity of variance means that the spread of scores (how much they vary) should be similar in all three classes. If one class has scores all between 80-90 (small variance) and another has scores between 40-100 (large variance), this assumption is violated [7].
Ways to check: you do NOT want p < .001:
Sphericity - the assumption that the differences between measurements in repeated measures have approximately the same variance and correlations.
Create a scatterplot of the fake regression.
In theory, the residuals should be randomly distributed (hence why we created a random variable to test with) [3].
What you want to see: The plot should look like a bunch of random dots scattered evenly around zero, with no clear pattern [8]. Think of it like stars randomly scattered across the sky - no clusters, no shapes, just randomness.
Homogeneity - is the spread above that line the same as below that 0, 0 line (both directions)? [8]
Homoscedasticity - is the spread equal all the way across the x axis? [8]
In this lecture, we have covered:
[1] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 170-171. [Assumption of independence]
[2] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 274-276. [Multicollinearity and additivity]
[3] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 293-295. [Linearity assumption and residual plots]
[4] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 269-271. [Standardized residuals and error distributions]
[5] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 179-182. [Q-Q plots and assessing normality]
[6] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 168-169. [Central Limit Theorem and normality assumption]
[7] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 185-188. [Homogeneity of variance and Levene’s test]
[8] A. Field, J. Miles, and Z. Field, Discovering Statistics Using R. London, UK: SAGE Publications, 2012, pp. 272-273, 293. [Homoscedasticity and residual plots]