# one-sided
qnorm(0.95) # 95 %
## [1] 1.644854
qnorm(0.99) # 99 %
## [1] 2.326348
# two-sided
qnorm(0.975) # 95 %
## [1] 1.959964
qnorm(0.995) # 99 %
## [1] 2.575829
Comparison of two samples
Recall that for the two large samples, \[Z = \frac {(\bar X_1 - \bar X_2 - D_0)}{\sqrt {\frac{\sigma_1}{n_1} + \frac{\sigma_2}{n_2}}} \sim N(0, 1)\] \[\bar X_1 - \bar X_2 \pm z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}\] \[\bar X_1 - \bar X_2 \pm z_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\]
For two small samples with equal variance,
\[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 -2}}\]
\[T = \frac {(\bar X_1 - \bar X_2 - D_0)}{s_p \sqrt {\frac{1}{n_1} + \frac{1}{n_2}}} \sim t(n_1 + n_2 - 2) \] \[\bar X_1 - \bar X_2 \pm t_{\alpha/2} (n_1 + n_2 -2) S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]
For two independent samples, \[T = \frac {(\bar X_1 - \bar X_2 - D_0)}{\sqrt {\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \sim t(\nu^*)\] where \[\nu^* = \frac{(\frac{s_1^2}{n_1} + \frac {s_2^2}{n_2})^2}{{\frac{({s_1^2})^2}{n_1^2(n_1 - 1)} + \frac{(s_2^2)^2}{n_2^2(n_2-1)}}}\], rounded down to the nearest integer or \(\nu = min(n_1, n_2)-1\) (less powerful)
\[\bar X_1 - \bar X_2 \pm t_{\alpha/2} (\nu^*) \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\] where \(D_0 = \mu_1 - \mu_2\)
Let’s consider the example below.
psycological study example (Wackearly, p500)
A psycological study was conducted to compare the reaction times of men and women to a stimulus. Independent random samples of 50 men and 50 women were employed in the experiment. The results are shown in the table below. Do the data present sufficient evidence to suggest a difference between true mean reaction times for men and women? Use \(\alpha = 0.05\).
Men | Women | |
---|---|---|
n | 50.00 | 50.00 |
sec | 3.60 | 3.80 |
s^2 | 0.18 | 0.14 |
n1 <- 50 # man
n2 <- 50 # women
y1 <- 3.6 # mean (sec)
y2 <- 3.8 # mean (sec)
var1 <- 0.18
var2 <- 0.14
d0 <- 0
alpha <- 0.05
# Using formular for Two large samples (Not recommended)
z <- abs((y1 - y2 - d0)/(sqrt((var1/n1) + (var2/n2)))); z
## [1] 2.5
z_critical <- qnorm(1 - alpha/2)
print(paste0("Since |z| = ",abs(z)," > z_critical = ", round(z_critical,2),
", The difference between man and women is
statistically significant with confidence level of a = 0.05"))
## [1] "Since |z| = 2.5 > z_critical = 1.96, The difference between man and women is \n statistically significant with confidence level of a = 0.05"
# Using t-distribution (Equal variance)
df <- n1 + n2 - 2
sp <- sqrt(((n1 - 1)*var1 + (n2 - 1)*var2)/df)
t <- abs((y1 - y2 - d0) / (sp*(sqrt(1/n1 + 1/n2)))); t
## [1] 2.5
t_critical <- qt(1 - alpha/2, df); t_critical
## [1] 1.984467
print(paste0("Since |t| = ",abs(t)," > t_critical = ", round(t_critical,2),
", The difference between man and women is
statistically significant with confidence level of a = 0.05"))
## [1] "Since |t| = 2.5 > t_critical = 1.98, The difference between man and women is \n statistically significant with confidence level of a = 0.05"
# Using t-distribution (Two independent samples)
nu <- floor(((var1/n1) + (var2/n2))^2 / (((var1/n1)^2/(n1 - 1)) + (var2/n2)^2/(n2 - 1)))
SE <- sqrt((var1/n1) + (var2/n2))
t <- abs((y1 - y2 - d0) / SE); t
## [1] 2.5
t_critical <- qt(1 - alpha/2, nu); t_critical
## [1] 1.984984
print(paste0("Since |t| = ",abs(t)," > t_critical = ", round(t_critical,2),
", The difference between man and women is
statistically significant with confidence level of a = 0.05"))
## [1] "Since |t| = 2.5 > t_critical = 1.98, The difference between man and women is \n statistically significant with confidence level of a = 0.05"
For Jones’s political poll, \(N=15\) voters were sampled. We wish to test \(H_0: p = 0.5\) against the alternative, \(H_A: p < 0.5\). The test statistic is Y, the number of sampled voters favoring Jones.
\(RR = \{y \leq 2\}\), \(\alpha\)?
By definition,
\[\alpha = P(type \ I \ error) = P(rejecting \ H_0 \ when \ H_0 \ is \ true)\]
\[= P(value \ of \ test \ statistic \ is \ in \ RR \ when \ H_0 \ is \ true\]
\[= P(Y \leq 2 \ when \ p = 0.5)\]
Since Y is a binomial random variable with \(N = 15\), if \(H_0\) is true, \(p=0.5\) and we obtain
\[P(Incorrect \ rejecting \ H_0) = P(Y\leq2|p=0.5)\]
\[\alpha = \sum_{y=0}^2 {15 \choose y} (0.5)^y(0.5)^{15-y} = {15 \choose 0}(0.5)^{15} + {15 \choose 1}(0.5)^{15} + {15 \choose 2}(0.5)^{15} = 0.004\]
\[P(Incorrect \ rejecting \ H_0) = P(Y\leq5|p=0.5)\]
\[\alpha = \sum_{y=0}^5 {15 \choose y} (0.5)^y(0.5)^{15-y} = {15 \choose 0}(0.5)^{15} + ... + {15 \choose 5}(0.5)^{15} = 0.151\]
\[P(Incorrect \ accepting \ H_0) = P(Y>2|p=0.3)\]
\[\beta = \sum_{y=3}^{15} {15 \choose y} (0.3)^y(0.7)^{15-y}\]
\[= 1-\alpha = 1 - \sum_{y=0}^{2} {15 \choose y} (0.3)^y(0.7)^{15-y} = 1-0.127 = 0.873\]
# Direct calculation
# 1. RR = {y <= 2}, p = 0.5, alpha
a1 <- (choose(15,0) + choose(15,1) + choose(15,2))*0.5^15; a1
## [1] 0.003692627
# 2. RR = {y <= 5}, p = 0.5, alpha
a2 <- (choose(15,0) + choose(15,1) + choose(15,2) + choose(15,3) +
choose(15,4) + choose(15,5))*0.5^15; a2
## [1] 0.1508789
# 3. RR = {y <= 2}, p = 0.3, beta
b1 <- 1 - (choose(15,0)*0.3^0*0.7^15 + choose(15,1)*0.3^1*0.7^14 +
choose(15,2)*0.3^2*0.7^13); b1
## [1] 0.8731723
# 4. RR = {y <= 5}, p = 0.3, beta
b2 <- 1 - (choose(15,0)*0.3^0*0.7^15 + choose(15,1)*0.3^1*0.7^14 +
choose(15,2)*0.3^2*0.7^13 + choose(15,3)*0.3^3*0.7^12 +
choose(15,4)*0.3^4*0.7^11 + choose(15,5)*0.3^5*0.7^10); b2
## [1] 0.2783786
df <- round(matrix(c(a1, a2, b1, b2), nrow = 2, byrow = TRUE), 3)
rownames(df) <- c("a_w_p0.5", "b_w_p0.3")
colnames(df) <- c("RR_le_2", "RR_le_5")
kable(df)
RR_le_2 | RR_le_5 | |
---|---|---|
a_w_p0.5 | 0.004 | 0.151 |
b_w_p0.3 | 0.873 | 0.278 |
# Using base R function
library(knitr)
# Analyzing number of positive voter (Y) to guess the positive ratio of voting (p).
# selecting X number of positive voter out of 15 (For rejection region, RR1 = y<=2 and RR2 = y<=5)
# RR1
RR1 <- c(); RR2 <- c()
RR1$a <- round(pbinom(2, size = 15, prob = 0.5), 3)
# RR2
RR2$a <- round(pbinom(5, size = 15, prob = 0.5), 3)
# Type II error b for RR1 and when the actual result showed p = 0.3.
# P(Y > 2 | p=0.3) = 1 - P(Y <= 2 | p=0.3)
# RR1
RR1$b <- round(1 - pbinom(2, size = 15, prob = 0.3), 3)
# RR2
RR2$b <- round(1 - pbinom(5, size = 15, prob = 0.3), 3)
RR <- cbind(RR1, RR2)
rownames(RR) <- c("a_w_p0.5", "b_w_p0.3")
colnames(RR) <- c("RR_le_2", "RR_le_5")
kable(RR)
RR_le_2 | RR_le_5 | |
---|---|---|
a_w_p0.5 | 0.004 | 0.151 |
b_w_p0.3 | 0.873 | 0.278 |
We cannot reduce both errors at the same time except increasing n
Example: Suppose \(y\) is a single observation (one data point) from a population with probability density function given by: \[f(y|\theta) = \begin{cases}{\theta y^{\theta -1}, \ 0<y<1,} \\ {0, \ elsewhere} \end{cases}\] Find the most powerful test (MPT), with significance level \(\alpha = 0.05\), for testing the simple null hypothesis \(H_0: \theta = 2\) against the simple alternative hypothesis \(H_A: \theta = 1\).
Solution) \[\frac{L(\theta_0)}{L(\theta_{\alpha})} = \frac{f(y|\theta_0)}{f(y|\theta_{\alpha})}= \frac{2y^{2-1}}{1y^{1-1}} = 2y \leq k\] for \(\alpha = 0.05\), Let \(K^* = \frac{k}{2}\)
\[\alpha = P(y < k^* \ when \ \theta = 2) = \int_0^{k^*} 2y \ dy = 0.05\] \[[y^2]_{y=0}^{y=k^*} = (k^*)^2 = 0.05\] \[k^* = (0.05)^{1/2} = 0.2236\]
# a = 0.05, theta_0 = 2, theta_a = 1
a <- 0.05
k <- a^(1/2)
y <- c(0, 1, 0.01)
theta_0 <- 2
theta_a <- 1
f <- function(y) theta_0*(y^(theta_0 - 1))
fa <- function(y) theta_a*(y^(theta_a - 1))
curve(f, col = "red")
curve(fa, add = TRUE, col = "blue")
abline(v = k)
abline(v = 0, lty = 2)
abline(v = 1, lty = 2)
abline(h = 0, lty = 2)
cord.x <- c(0, seq(0, k, 0.001), k)
cord.y <- c(0, f(seq(0, k, 0.001)), 0)
cord.xx <- c(k, seq(k, 1, 0.001), 1)
cord.yy <- c(0, fa(seq(k, 1, 0.001)), 0)
redtrans <- rgb(255, 0, 0, 100, maxColorValue = 255)
bluetrans <- rgb(0, 0, 255, 100, maxColorValue = 255)
polygon(cord.x,cord.y, col = redtrans) # alpha
polygon(cord.xx,cord.yy, col = bluetrans) # beta
legend("topleft", legend = c("H_0", "H_a"), col = c("red", "blue"), lty = rep(1,2))
for \(H_0:\theta = 1, H_a:\theta=2\) and \(\alpha = 0.1\) case,
\[\frac{L(\theta_0)}{L(\theta_{\alpha})} = \frac{f(y|\theta_0)}{f(y|\theta_{\alpha})}= \frac{1y^{1-1}}{2y^{2-1}} = \frac{1}{2y} \leq k, \\ y \geq \frac{1}{2k}\] for \(\alpha = 0.1\), Let \(K^* = \frac{1}{2k}\)
\[\alpha = P(y > k^* \ when \ \theta = 1) = \int_{k^*}^{1} 1 \ dy = 0.1 \\ 1-k^* = 0.1, k^* = 0.9\]
\[P(y \leq k^* \ when \ \theta = 2) = \int_0^{k^*} 2y \ dy = \int_0^{0.9} 2y \ dy = y^2|^{0.9}_0 = 0.81\] \[[y^2]_{y=0}^{y=k^*} = (k^*)^2 = 0.05\] \[k^* = (0.05)^{1/2} = 0.2236\]
# a = 0.1, theta_0 = 1, theta_a = 2
a <- 0.1
k <- 0.9
y <- c(0, 1, 0.01)
theta_0 <- 1
theta_a <- 2
f <- function(y) theta_0*(y^(theta_0 - 1))
fa <- function(y) theta_a*(y^(theta_a - 1))
curve(f, col = "red", ylim = c(0,2))
curve(fa, add = TRUE, col = "blue")
abline(v = k)
abline(v = 0, lty = 2)
abline(v = 1, lty = 2)
abline(h = 0, lty = 2)
cord.x <- c(k, seq(k, 1, 0.001), 1)
cord.y <- c(0, f(seq(k, 1, 0.001)), 0)
redtrans <- rgb(255, 0, 0, 100, maxColorValue = 255)
bluetrans <- rgb(0, 0, 255, 100, maxColorValue = 255)
cord.xx <- c(0, seq(0, k, 0.001), k)
cord.yy <- c(0, fa(seq(0, k, 0.001)), 0)
polygon(cord.x,cord.y, col = redtrans) # alpha
polygon(cord.xx,cord.yy, col = bluetrans) # beta
legend("topleft", legend = c("H_0", "H_a"), col = c("red", "blue"), lty = rep(1,2))
Calculate appropriate sample size when \(H_0: \mu = 15\), \(H_a: \mu = 16\), \(\alpha = \beta = 0.05\), \(\sigma^2\) is approximately 9.
Since \(z_{\alpha} = z_{\beta} = z_{0.05} = 1.645\), \(n = \frac{(z_{\alpha}+z_{\beta})^2 \sigma^2}{(\mu_a - \mu_0)^2} = \frac{(1.645+1.645)^2(9)}{(16-15)^2} = 97.4\)
a <- 0.05
b <- 0.05
mu_0 <- 15
mu_a <- 16
s <- sqrt(9)
d <- (mu_a - mu_0)/s # effect size
za <- qnorm(a)
zb <- qnorm(b)
n <- ((za + zb)^2 * s^2) / (mu_a - mu_0)^2; n
## [1] 97.39956
# using R function
library(pwr)
pwr.norm.test(d = d, sig.level = a, power = 1 - b, alternative = "greater")
##
## Mean power calculation for normal distribution with known variance
##
## d = 0.3333333
## n = 97.39957
## sig.level = 0.05
## power = 0.95
## alternative = greater
Men | Women | |
---|---|---|
n | 50.00 | 50.00 |
sec | 3.60 | 3.80 |
var | 0.18 | 0.14 |
n1 <- 50 # man
n2 <- 50 # women
y1 <- 3.6 # mean (sec)
y2 <- 3.8 # mean (sec)
var1 <- 0.18
var2 <- 0.14
d0 <- 0
alpha <- 0.05
# Using formular for Two large samples (Not recommended)
z <- abs((y1 - y2 - d0)/(sqrt((var1/n1) + (var2/n2)))); z
## [1] 2.5
z_critical <- qnorm(1 - alpha/2); z_critical
## [1] 1.959964
# plot z-distribution
x <- seq(-5, 5, 0.01)
y <- dnorm(x)
plot(x, y, type = 'l')
abline(v = z_critical, col = "red")
cord.x <- c(z_critical, seq(z_critical, 5, 0.01), 5)
cord.y <- c(0, dnorm(seq(z_critical, 5, 0.01)), 0)
polygon(cord.x,cord.y,col = 'red')
abline(v = z, col = "blue")
# Using t-distribution (Two independent samples)
nu <- floor(((var1/n1) + (var2/n2))^2 / (((var1/n1)^2/(n1 - 1)) + (var2/n2)^2/(n2 - 1)))
SE <- sqrt((var1/n1) + (var2/n2))
t <- abs((y1 - y2 - d0) / SE); t
## [1] 2.5
t_critical <- qt(1 - alpha/2, nu); t_critical
## [1] 1.984984
# plot t-distribution
x <- seq(-5, 5, 0.01)
y <- dt(x, nu)
plot(x, y, type = 'l')
abline(v = t_critical, col = "red")
cord.x <- c(t_critical, seq(t_critical, 5, 0.01), 5)
cord.y <- c(0, dt(seq(t_critical, 5, 0.01), nu), 0)
polygon(cord.x,cord.y,col = 'red')
abline(v = t, col = "blue")
Since the value is less than the critical value, at the \(\alpha = 0.05\) level, we conclude that sufficient evidence exists to permit us to conclude that mean reaction times differ for men and women.
The effect of new treatment was compared to the standard treatment in two separate patient groups. Can we conclude that the new treatment improves with \(\alpha=0.01\)? Estimate with pooled and unpooled procedure.
x <- c(33, 54, 62, 46, 52, 42, 34, 51, 26, 68, 47, 40, 46, 51, 60)
y <- c(65, 61, 37, 47, 45, 53, 53, 69, 49, 42, 40, 67, 46, 43, 51)
data <- cbind(x,y)
rownames(data) <- 1:15
n1 <- length(x)
n2 <- length(y)
mu1 <- mean(x)
mu2 <- mean(y)
sd1 <- sd(x)
sd2 <- sd(y)
a <- 0.01
# unpooled procedure (one-sided)
df <- ((sd1^2/n1) + (sd2^2/n2))^2/((sd1^4/(n1^2*(n1 - 1))) + (sd2^4/(n2^2*(n2 - 1))))
t <- (mu1 - mu2) / sqrt((sd1^2/n1) + (sd2^2/n2)); t
## [1] -0.9495618
t_critical <- qt(a, df, lower.tail = TRUE); t_critical
## [1] -2.469326
# p-value
pt(t, df)
## [1] 0.1752889
# upper bound of 99 % CI
ci <- mu1 - mu2 + qt(1 - a, df)*sqrt((sd1^2/n1) + (sd2^2/n2)); ci
## [1] 5.975161
# pooled procedure (one-sided)
df <- n1 + n2 - 2
sp2 <- (((n1 - 1) * sd1^2) + ((n2 - 1) * sd2^2))/df
t <- (mu1 - mu2) / (sqrt(sp2) * sqrt(1/n1 + 1/n2))
t_critical <- qt(a, df, lower.tail = TRUE)
pt(t, df)
## [1] 0.1752298
# upper bound of 99 % CI
ci <- mu1 - mu2 + qt(0.990, df) * (sqrt(sp2) * sqrt(1/n1 + 1/n2)); ci
## [1] 5.966569
# base R function
# unpooled
t.test(x, y, alternative = "less", conf.level = 0.99)
##
## Welch Two Sample t-test
##
## data: x and y
## t = -0.94956, df = 27.595, p-value = 0.1753
## alternative hypothesis: true difference in means is less than 0
## 99 percent confidence interval:
## -Inf 5.975161
## sample estimates:
## mean of x mean of y
## 47.46667 51.20000
# Pooled
t.test(x, y, alternative = "less", var.equal = TRUE, conf.level = 0.99)
##
## Two Sample t-test
##
## data: x and y
## t = -0.94956, df = 28, p-value = 0.1752
## alternative hypothesis: true difference in means is less than 0
## 99 percent confidence interval:
## -Inf 5.966569
## sample estimates:
## mean of x mean of y
## 47.46667 51.20000
The nerve conductivity data from healthy and nerve disorder are summarized in the following figure. Is there sufficient evidence to establish that the nerve conductivity speeds are different for the two groups of subjects?
n1 <- 32
mu1 <- 53.994
sd1 <- 0.974
n2 <- 27
mu2 <- 48.594
sd2 <- 2.490
df <- ((sd1^2/n1) + (sd2^2/n2))^2/((sd1^4/(n1^2*(n1 - 1))) + (sd2^4/(n2^2*(n2 - 1))))
t <- (mu1 - mu2) / sqrt((sd1^2/n1) + (sd2^2/n2)); t
## [1] 10.60498
# 99% ci
upper <- (mu1 - mu2) - (qt(0.005, df) * sqrt((sd1^2/n1) + (sd2^2/n2)))
lower <- (mu1 - mu2) + (qt(0.005, df) * sqrt((sd1^2/n1) + (sd2^2/n2)))
upper; lower
## [1] 6.792575
## [1] 4.007425
# p-value
1 - pt(t, df)
## [1] 2.04281e-12
# base R function (sampling the data from 2 normal distributions)
x <- rnorm(n1, mu1, sd1)
y <- rnorm(n2, mu2, sd2)
t.test(x, y, alternative = "two.sided", conf.level = 0.99)
##
## Welch Two Sample t-test
##
## data: x and y
## t = 11.201, df = 34.183, p-value = 5.501e-13
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
## 3.982317 6.545930
## sample estimates:
## mean of x mean of y
## 54.01585 48.75172
The percentage reductions in heart rate for the standard drug \(x_i\) and the new drug \(y_i\), together with the differences \(z_i = x_i - y_i\), for the 40 experimental subjects are listed in the table. When the average difference of sample \(\bar z = -2.655\) and the sample standard error of the difference \(s = 3.730\), Can we conclude as there is no difference between two drugs?
data <- read.csv(file = "http://bisyn.kaist.ac.kr/bis335/example.csv")
data <- data[,-1]
z <- data[,3]
n <- length(z)
mu <- mean(z)
sd <- sd(z)
t <- (sqrt(n)*(mu - 0))/sd; t
## [1] -4.501646
df <- n - 1; df
## [1] 39
pval <- 2*pt(t, df); pval
## [1] 5.948044e-05
# 99% ci
upper <- mu - (qt(0.005, df)*sd/sqrt(n))
lower <- mu + (qt(0.005, df)*sd/sqrt(n))
upper; lower
## [1] -1.057915
## [1] -4.252085
# base R function
t.test(data$x, data$y, paired = TRUE, conf.level = 0.99)
##
## Paired t-test
##
## data: data$x and data$y
## t = -4.5016, df = 39, p-value = 5.948e-05
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
## -4.252085 -1.057915
## sample estimates:
## mean of the differences
## -2.655
# plot
x <- seq(-15, 15, 0.01)
plot(x, dnorm(x, mu, sd), type = 'l')
abline(v = mu)
abline(v = upper, col = "red")
abline(v = lower, col = "red")
Pain thresholds to electrical shocks for males and females. Is variability of male and female are different?
Males | Females | |
---|---|---|
n | 14.0 | 10.0 |
ybar | 16.2 | 14.9 |
s^2 | 12.7 | 26.4 |
n1 <- 14; n2 <- 10; y1 <- 16.2; y2 <- 14.9; var1 <- 12.7; var2 <- 26.4; alpha <- 0.1
f <- var2 / var1; f
## [1] 2.07874
df1 <- n1 - 1; df1
## [1] 13
df2 <- n2 - 1; df2
## [1] 9
# f value for a = 0.1
f_critical <- qf(0.95, df2, df1); f_critical
## [1] 2.714356
f <- pf(f, df2, df1); f
## [1] 0.8881824
# variability is not different
x <- seq(0,5,0.01)
plot(x, df(x, df2, df1), type = 'l', ylab = "f(x)")
abline(v = f_critical, col = "red")
abline(v = f)
# base R function
x <- rnorm(n1, y1, sqrt(var1))
y <- rnorm(n2, y2, sqrt(var2))
var.test(x, y, alternative = "two.sided", conf.level = 0.90)
##
## F test to compare two variances
##
## data: x and y
## F = 0.28111, num df = 13, denom df = 9, p-value = 0.03827
## alternative hypothesis: true ratio of variances is not equal to 1
## 90 percent confidence interval:
## 0.09224125 0.76303187
## sample estimates:
## ratio of variances
## 0.2811097
In a silicon model of an artery, the flowrates at collapse for the three different stenosis (blocking) levels (L1=0.78, L2=0.71, L3=0.65) were measured. Average collapse flowrates were X1=11.209, X2=15.086, and X3=17.330 for three stenosis models. The sample sizes were n1=11, n2=14, n3=10. Can we conclude that the average collapse flowrates are significantly different?
L1 <- c(10.6, 9.7, 10.2, 14.3, 10.6, 11.0, 13.1, 10.8, 14.3, 10.4, 8.3)
L2 <- c(11.7, 12.7, 12.6, 16.6, 16.3, 14.7, 19.5, 16.8, 15.1, 15.2, 13.2, 14.8, 14.4, 17.6)
L3 <- c(19.6, 15.1, 15.6, 19.2, 18.7, 13.0, 18.9, 18.0, 18.6, 16.6)
L <- unlist(c(L1, L2, L3))
k <- 3 # L1, L2, L3
n <- length(L)
n1 <- length(L1)
n2 <- length(L2)
n3 <- length(L3)
mu <- mean(L)
mu1 <- mean(L1)
mu2 <- mean(L2)
mu3 <- mean(L3)
ss <- sum(L^2)
sst <- ss - n*mu^2
sstr <- (n1 * mu1^2) + (n2 * mu2^2) + (n3 * mu3^2) - (n * mu^2)
sse <- sst - sstr
df <- k - 1 # L1, L2, L3
mstr <- sstr/df
mse <- sse/(n - k)
f <- mstr/mse; f
## [1] 23.5747
pval <- pf(f, k - 1, n - k, lower.tail = FALSE)
stenosis_level <- c(k - 1, sstr, mstr, f, pval)
residual <- c(n - k, sse, mse, NA , NA)
summary_mat <- rbind(stenosis_level, residual)
colnames(summary_mat) <- c("Df", "Sum sq", "Mean sq", "F value", "Pr(>F)")
knitr::kable(summary_mat)
Df | Sum sq | Mean sq | F value | Pr(>F) | |
---|---|---|---|---|---|
stenosis_level | 2 | 204.0202 | 102.010097 | 23.5747 | 5e-07 |
residual | 32 | 138.4672 | 4.327101 | NA | NA |
# base R function
df <- data.frame(flowrate = c(L1, L2, L3),
stenosis_level = factor(rep(c("L1", "L2", "L3"),
times = c(length(L1), length(L2), length(L3)))))
fit <- aov(flowrate ~ stenosis_level, data = df)
anova(fit)
## Analysis of Variance Table
##
## Response: flowrate
## Df Sum Sq Mean Sq F value Pr(>F)
## stenosis_level 2 204.02 102.010 23.575 5.096e-07 ***
## Residuals 32 138.47 4.327
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p <- seq(0.40, 0.8, 0.05)
beta <- c(0.886, 0.758, 0.581, 0.385, 0.213, 0.094, 0.031, 0.007, 0.001)
power <- c(0.114, 0.242, 0.419, 0.615, 0.787, 0.906, 0.969, 0.993, 0.999)
tbl <- cbind(Probability = p, Beta = beta, Power = power)
knitr::kable(tbl)
Probability | Beta | Power |
---|---|---|
0.40 | 0.886 | 0.114 |
0.45 | 0.758 | 0.242 |
0.50 | 0.581 | 0.419 |
0.55 | 0.385 | 0.615 |
0.60 | 0.213 | 0.787 |
0.65 | 0.094 | 0.906 |
0.70 | 0.031 | 0.969 |
0.75 | 0.007 | 0.993 |
0.80 | 0.001 | 0.999 |
We will cover this topic in the classification part
We will cover this topic in the classification part
Recall the error table, where \(\alpha = P(Type \ \ I \ \ Error)\) and \(\beta = P(Type \ \ II \ \ Error)\)
In other words, If we perform 1 hypothesis test, the probability of at least 1 false positive is equal to \(\alpha\).
What is the probability of at least 1 false positive if we conduct \(m\) hypothesis tests? Since we conduct \(m\) hypothesis at the same time, it is given as below.
\[P(Making \ \ an \ \ error) = \alpha\] \[P(Not \ \ making \ \ an \ \ error) = 1 - \alpha\] \[P(Not \ \ making \ \ an \ \ error \ \ in \ \ m \ \ test) = (1 - \alpha)^m\] \[P(making \ \ at \ \ least \ \ 1 \ \ error \ \ in \ \ m \ \ test) = 1 - (1 - \alpha)^m\]
m <- 0:100
a <- 0.05
plot(m, 1 - (1 - a)^m, type = 'l', ylab = "P(At least 1 false positive)")
Let’s assume we are testing \(H^1, H^2, ..., H^m\). In addition, \(m_0\) represents the number of true hypotheses and \(R\) represents the number of rejected hypotheses. Based on this notation, we can generate table as shown below.
Called significant | Not called significant | Total | |
---|---|---|---|
Null True | \(V\) | \(m_0-V\) | \(m_0\) |
Alternative True | \(S\) | \(m_1-S\) | \(m_1\) |
True | \(R\) | \(m-R\) | \(m\) |
in the table, \(V\) represents the number of false positives and \(S\) represents the number of true positives. In other words, \(V\) represents the number of type I errors and \(m_1-S\) represents the number of type II errors.
What people say “adjusting p-values for the number of hypothesis tests performed”, what they mean is controlling the Type I error rate. There are many ways to deal with and it is active research area but here we list some of well known methods.
Different Approaches to control Type I Errors
Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses, \(PCER = \frac{E(V)}{m}\)
Per-family error rate (PFER): the expected number of Type I errors, \(PFER = E(V)\)
Family-wise error rate (FWER): the probability of at least one type I error, \(FWER = P(V\geq1)\)
False discovery rate (FDR): the expected proportion of Type I errors among the rejected hypotheses, \(FDR = E(\frac{V}{R}|R>0) P(R>0)\)
Positive false discovery rate (pFDR): the rate that discoveries are false, \(pFDR = E(\frac{V}{R}|R>0)\)
Controlling FWER
There are two general types of FWER corrections:
Single step: equivalent adjustments made to each p-value (Bonferroni)
Sequential: adaptive adjustment made to each p-value (Holm’s Method)
Bonferroni
Bonferroni is a very simple method for ensuring that the overall Type I error rate of \(\alpha\) is maintained when performing \(m\) independent hypothesis tests. The procedure is as follows Rejects any hypothesis with \(p-value\leq \frac{\alpha}{m}\): \(\tilde p_j = \ min[m \cdot p_j ,1]\). For example, if we want to have an experiment wide Type I error rate of 0.05, when we perform 10,000 hypothesis tests, we’d need a p-value of \(\frac {0.05}{10000} = 5 \times 10^{-6}\) to declare significance. (or \(p_j = min[10000 \cdot p_j, 1]\)) It is the most strict FWER correction method but there is high probability of Type II error with this method.
p <- c(0.0008, 0.009, 0.165, 0.205, 0.396, 0.450, 0.641, 0.781, 0.900, 0.993)
# Bonferroni
m <- length(p)
pmin(1, m * p)
## [1] 0.008 0.090 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
p.adjust(p, method = "bonferroni")
## [1] 0.008 0.090 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
# cf pmin, pmax, cummin, and cummax
min(1:5, pi)
## [1] 1
pmin(1:5, pi)
## [1] 1.000000 2.000000 3.000000 3.141593 3.141593
max(1:5, pi)
## [1] 5
pmax(1:5, pi)
## [1] 3.141593 3.141593 3.141593 4.000000 5.000000
cummin(1:5)
## [1] 1 1 1 1 1
cummax(1:5)
## [1] 1 2 3 4 5
Holm’s Method
Holm’s method is a simplest sequential FWER correction method. The procedure is as follows. Order the unadjusted p-values such that \(p_1 \leq p_2 \leq ... \leq p_m\). For control of the FWER at level \(\alpha\), the step-down Holm adjsuted p-value are \(\tilde p_j = \ min[(m-j+1) \cdot p_j,1]\). here \(j\) represent the rank of p-value by degree of significance. The point here is that we don’t multiply every \(p_i\) by the same factor \(m\). For example, when \(m = 10000\): \(\tilde p_1 = 10000 \cdot p_1\), \(\tilde p_2 = 9999 \cdot p_2\), …, \(\tilde p_m = 1 \cdot p_m\)
p <- c(0.0008, 0.009, 0.165, 0.205, 0.396, 0.450, 0.641, 0.781, 0.900, 0.993)
# Holm's procedure
m <- length(p)
j <- seq_len(m)
o <- order(p)
ro <- order(o)
pmin(1, cummax((m - j + 1L) * p[o]))[ro]
## [1] 0.008 0.081 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
# base R
p.adjust(p, method = "holm")
## [1] 0.008 0.081 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
pitfalls of Controlling FWER
FWER is appropriate when you want to guard against ANY false positives. However, in many cases especially when the number of test is large, it become too strict to draw any meaningful conclusion. In these cases, the more relevant quantity to control is the false discovery rate (FDR). FDR is designed to control the proportion of false positives among the set of rejected hypotheses (\(R\)). In our table, \(FDR = \frac{V}{R}\) and \(FPR = \frac{V}{m_0}\) (FPR: False positive rate).
Benjamini and Hochberg (BH) FDR
To control FDR at level \(\delta\): first, order the unadjusted p-values: \(p_1 \leq p_2 \leq ... \leq p_m\). Then find the test with the highest rank, \(j\), for which the p-value, \(p_j\), is less than or equal to \(\frac{j}{m} \times \delta\). Declare the test of rank \(1, 2, ..., j\) as significant when \(p(j) \leq \delta \frac{j}{m}\) (or \(p(j)\frac{m}{j} \leq \delta\))
p <- c(0.0008, 0.009, 0.165, 0.205, 0.396, 0.450, 0.641, 0.781, 0.900, 0.993)
# BH procedure
m <- length(p)
j <- m:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(m/j * p[o]))[ro]
## [1] 0.0080000 0.0450000 0.5125000 0.5125000 0.7500000 0.7500000 0.9157143
## [8] 0.9762500 0.9930000 0.9930000
p.adjust(p, method = "BH")
## [1] 0.0080000 0.0450000 0.5125000 0.5125000 0.7500000 0.7500000 0.9157143
## [8] 0.9762500 0.9930000 0.9930000
Benjamini and Yekutieli (BY) FDR
BY correction is very similar to BH correction. It is appropriate when one suspects very strong correlations among tests. Simply replace \(\frac{m}{j}\) by \(q\frac{m}{j}\) where \(q = \sum_{j=1}^{m} \frac{1}{j}\).
p <- c(0.0008, 0.009, 0.165, 0.205, 0.396, 0.450, 0.641, 0.781, 0.900, 0.993)
# BY procedure
m <- length(p)
j <- m:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
q <- sum(1L/(1L:m))
pmin(1, cummin(q * m/j * p[o]))[ro]
## [1] 0.02343175 0.13180357 1.00000000 1.00000000 1.00000000 1.00000000
## [7] 1.00000000 1.00000000 1.00000000 1.00000000
p.adjust(p, method = "BY")
## [1] 0.02343175 0.13180357 1.00000000 1.00000000 1.00000000 1.00000000
## [7] 1.00000000 1.00000000 1.00000000 1.00000000