Ziyuan Huang
Last Updated: 2026-02-05
Simple experiments:
For example: manipulation of the independent variable involves having an experimental condition and a control.
Don’t make a continuous variable categorical just so you can do a t-test.
People used to split variables into high versus low or simply split down the middle.
Reminder:
Between subjects / Independent designs
Repeated measures / within subjects / dependent designs
Independent t-test:
Dependent t-test:
Are invisible people mischievous?
Manipulation
Outcome measured how many mischievous acts participants performed in a week.
## 'data.frame': 24 obs. of 2 variables:
## $ Cloak : chr "No Cloak" "No Cloak" "No Cloak" "No Cloak" ...
## $ Mischief: int 3 1 5 4 6 4 6 2 0 5 ...
M <- tapply(longdata$Mischief, longdata$Cloak, mean)
STDEV <- tapply(longdata$Mischief, longdata$Cloak, sd)
N <- tapply(longdata$Mischief, longdata$Cloak, length)
M;STDEV;N## Cloak No Cloak
## 5.00 3.75
## Cloak No Cloak
## 1.651446 1.912875
## Cloak No Cloak
## 12 12
Our means appear slightly different. What might have caused those differences?
\[t = \frac{\text{Signal}}{\text{Noise}} = \frac{\text{Observed Difference} - \text{Expected Difference}}{\text{Standard Error of Difference}}\]
We use the standard error as a gauge of the variability between sample means. - If the difference between the samples we have collected is larger than what we would expect based on the standard error then we can assume one of two interpretations:
\[ t = \frac {\bar{X}_1 - \bar{X}_2}{\sqrt {\frac {s^2_p}{n_1} + \frac {s^2_p}{n_2}}}\]
\[ s^2_p = \frac {(n_1-1)s^2_1 + (n_2-1)s^2_2} {n_1+n_2-2}\]
shapiro.test()).car::leveneTest()).##
## Two Sample t-test
##
## data: Mischief by Cloak
## t = 1.7135, df = 22, p-value = 0.1007
## alternative hypothesis: true difference in means between group Cloak and group No Cloak is not equal to 0
## 95 percent confidence interval:
## -0.2629284 2.7629284
## sample estimates:
## mean in group Cloak mean in group No Cloak
## 5.00 3.75
var.equal = FALSE to use this adjustment.##
## Welch Two Sample t-test
##
## data: Mischief by Cloak
## t = 1.7135, df = 21.541, p-value = 0.101
## alternative hypothesis: true difference in means between group Cloak and group No Cloak is not equal to 0
## 95 percent confidence interval:
## -0.264798 2.764798
## sample estimates:
## mean in group Cloak mean in group No Cloak
## 5.00 3.75
Effect size options:
library(MOTE)
effect <- d.ind.t(m1 = M[1], m2 = M[2],
sd1 = STDEV[1], sd2 = STDEV[2],
n1 = N[1], n2 = N[2], a = .05)
effect$d## Cloak
## 0.6995169
library(pwr)
pwr.t.test(n = NULL, #leave NULL
d = effect$d, #effect size
sig.level = .05, #alpha
power = .80, #power
type = "two.sample", #independent
alternative = "two.sided") #two tailed test##
## Two-sample t test power calculation
##
## n = 33.0688
## d = 0.6995169
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
Are invisible people mischievous?
Manipulation
Outcome: We measured how many mischievous acts participants performed in week 1 and week 2.
Note: Same data, but instead the study is dependent. Let’s see what happens to our t-test. You would not change the analysis this way, but this example shows how the type of experiment and statistical test can affect power and results.
\[t = \frac {\bar{D} - \mu_D}{S_D/\sqrt N}\]
# Note: Formula method doesn't support paired=TRUE
t.test(longdata$Mischief[longdata$Cloak == "Cloak"],
longdata$Mischief[longdata$Cloak == "No Cloak"],
paired = TRUE) #dependent t##
## Paired t-test
##
## data: longdata$Mischief[longdata$Cloak == "Cloak"] and longdata$Mischief[longdata$Cloak == "No Cloak"]
## t = 3.8044, df = 11, p-value = 0.002921
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 0.5268347 1.9731653
## sample estimates:
## mean difference
## 1.25
Cohen’s d: Based on averages
Cohen’s d: Based on differences \(d_z\)
Which one should I use?
effect2 <- d.dep.t.avg(m1 = M[1], m2 = M[2],
sd1 = STDEV[1], sd2 = STDEV[2],
n = N[1], a = .05)
effect2$d## Cloak
## 0.7013959
## Cloak
## 0.6995169
If we were to use \(d_z\), we would overestimate the effect size.
You do not normally have to calculate both, just showing how these are different.
Create difference scores, calculate the difference score measures.
diff <- longdata$Mischief[longdata$Cloak == "Cloak"] - longdata$Mischief[longdata$Cloak == "No Cloak"]
effect2.1 = d.dep.t.diff(mdiff = mean(diff, na.rm = T),
sddiff = sd(diff, na.rm = T),
n = length(diff), a = .05)
effect2.1$d## [1] 1.098244
pwr.t.test(n = NULL,
d = effect2$d,
sig.level = .05,
power = .80,
type = "paired",
alternative = "two.sided")##
## Paired t test power calculation
##
## n = 17.96948
## d = 0.7013959
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number of *pairs*
library(ggplot2)
library(Hmisc)
cleanup <- theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
legend.key = element_rect(fill = "white"),
text = element_text(size = 15))
bargraph <- ggplot(longdata, aes(Cloak, Mischief))
bargraph +
cleanup +
stat_summary(fun = "mean",
geom = "bar",
fill = "White",
color = "Black") +
stat_summary(fun.data = mean_cl_normal,
geom = "errorbar",
width = .2,
position = "dodge") +
xlab("Invisible Cloak Group") +
ylab("Average Mischief Acts")In this lecture, you’ve learned: