Notation:
If \(X_1, X_2, ..., X_n\) is a random sample of size \(n\) from a distribution with probability density (or mass) function \(f(x;\theta)\), then the joint probability density (or mass) function of \(X_1, X_2, ..., X_n\) is denoted by the likelihood function \(L(\theta)\). That is, the joint p.d.f or p.m.f is: \[L(\theta) = L(\theta;x_1, x_2, ..., x_n) = f(x_1;\theta) \times f(x_2;\theta) \times ...\times f(x_n;\theta)\] Note that for the sake of ease, we drop the reference to the sample \(x_1, x_2, ..., x_n\) in using \(L(\theta)\) as the notation for the likelihood function. We will want to keep in mind though that the likelihood \(L(\theta)\) still depends on the sample data
Definition I - Simple and Composite Hypothesis:
If a random sample is taken from a distribution with parameter \(\theta\), a hypothesis is said to be a simple hypothesis if the hypothesis uniquely specifies the distribution of the population from which the sample is taken. Any hypothesis that is not a simple hypothesis is called a composite hypothesis. Example: Suppose \(X_1, X_2, ..., X_n\) is a random sample from an exponential distribution with parameter \(\theta\). Is the hypothesis \(H:\theta = 2\) a simple or a composite hypothesis?
Solution: The p.d.f. of an exponential random variable is: \[f(x) = \frac{1}{\theta} e^{-x/\theta}\] for \(x\geq 0\). Under the hypothesis \(H: \theta = 2\), the p.d.f of an exponential random variable is: \[f(x) = \frac{1}{2} e^{-x/2}\] for \(x\geq 0\). Because we can uniquely specify the p.d.f. under the hypothesis \(H: \theta = 2\), the hypothesis is a simple hypothesis.
If the hypothesis is \(H:\theta > 3\), then we can have infinite number of distributions for this hypothesis. Since the p.d.f. is not uniquely specified under the hypothesis \(H:\theta > 3\), the hypothesis is a composite hypothesis.
Quiz: Is the hypothesis \(H: \mu = 5\) a simple or composite hypothesis? (\(X_1, X_2, ..., X_n\) is a random sample from a normal distribution with mean \(\mu\) and unknown variance \(\sigma^2\), solution is given at the last page)
Definition II - The most powerful test:
Consider the test of the simple null hypothesis \(H_0:\theta = \theta_0\) against the simple alternative hypothesis \(H_A:\theta = \theta_a\). Let C and D be critical regions of size \(\alpha\), that is, let: \[\alpha = P(C;\theta_0) \ and \ \alpha = P(D;\theta_0) \] Then, C is a best critical region of size \(\alpha\) if the power of the test at \(\theta = \theta_a\) is the largest among all possible hypothesis tests. More formally, C is the best critical region of size \(\alpha\) if, for every other critical region D of size \(\alpha\), we have: \[P(C;\theta_{\alpha}) \geq P(D;\theta_{\alpha})\] that is, C is the best critical region of size \(\alpha\) if the power of C is at least as great as the power of every other critical region D of size \(\alpha\). We say that C is the most powerful size \(\alpha\) test.
The Neyman Pearson Lemma:
Suppose we have a random sample \(X_1, X_2, ..., X_n\) from a probability distribution with parameter \(\theta\). Then, if C is a critical region of size \(\alpha\) and \(k\) is a constant such that: \[\frac{L(\theta_0)}{L(\theta_{\alpha})} \leq k \ \ inside \ \ the \ \ critical \ \ region \ \ C\] and: \[\frac{L(\theta_0)}{L(\theta_{\alpha})} \geq k \ \ outside \ \ the \ \ critical \ \ region \ \ C\] then C is the best, that is, most powerful, critical region for testing the simple null hypothesis \(H_0:\theta = \theta_0\) against the simple alternative hypothesis \(H_A: \theta = \theta_a\)
(for the proof of this Lemma, See this page or equivalent materials)
Example: Suppose \(X\) is a single observation (one data point) from a population with probability density function given by: \[f(x) = \theta x^{\theta -1}\] for \(0<x<1\). Find the test with the best critical region, that is, find the most powerful test, with significance level \(\alpha = 0.05\), for testing the simple null hypothesis \(H_0: \theta =3\) against the simple alternative hypothesis \(H_A: \theta = 2\).
Solution: Because both the null and alternative hypotheses are simple hypotheses, we can apply the Neyman Pearson Lemma in an attempt to find the most powerful test. The lemma tells us that the ratio of the likelihoods under the null and alternative must be less than some constant k. Again, because we are dealing with just one observation X, the ratio of the likelihoods equals the ratio of the probability density functions, giving us:
\[\frac{L(\theta_0)}{L(\theta_{\alpha})} = \frac{3x^{3-1}}{2x^{2-1}} = \frac{3}{2}x \leq k\] That is, the lemma tells us that the form of the rejection region for the most powerful test is: \[\frac{3}{2}x \leq k\] or alternatively, since \(\frac{3}{2}x\) is just a new constant \(k^*\), the rejection region for the most powerful test is of the form:
\[x < \frac{2}{3}k = k^*\] Now, it’s just a matter of finding \(k^*\), and our work is done. We want \(\alpha = P(Type \ I \ Error) = P(rejecting \ H_0 \ when \ H_0 \ is \ true)\) to equal 0.05. In order for that to happen, the following must hold:
\[\alpha = P(X<k^* \ when \ \theta = 3) = \int_0^{k^*} 3x^2 dx = 0.05\] Doing the integration, we get:
\[[x^3]_{x=0}^{x=k^*} = (k^*)^3 = 0.05\] And, solving for \(k^*\), we get: \[k^* = (0.05)^{1/3} = 0.368\] That is, the Neyman Pearson Lemma tells us that the rejection region of the most powerful test for testing \(H_0:\theta = 3\) against \(H_A:\theta=2\), under the assumed probability distribution, is: \[x < 0.368\] That is, among all of the possible tests for testing \(H_0:\theta = 3\) against \(H_A:\theta=2\), based on a single observation X and with a significance level of 0.05, this test has the largest possible value for the power under the alternative hypothesis, that is, when \(\theta = 2\)
a <- 0.05
k <- a^(1/3)
x <- c(0, 1, 0.01)
theta_0 <- 3
theta_a <- 2
f <- function(x) theta_0*(x^(theta_0 - 1))
fa <- function(x) theta_a*(x^(theta_a - 1))
curve(f, col = "red")
curve(fa, add = TRUE, col = "blue")
abline(v = k)
cord.x <- c(0, seq(0, k, 0.001), k)
cord.y <- c(0, f(seq(0, k, 0.001)), 0)
cord.xx <- c(k, seq(k, 1, 0.001), 1)
cord.yy <- c(0, fa(seq(k, 1, 0.001)), 0)
redtrans <- rgb(255, 0, 0, 100, maxColorValue = 255)
bluetrans <- rgb(0, 0, 255, 100, maxColorValue = 255)
polygon(cord.x,cord.y, col = redtrans) # alpha
polygon(cord.xx,cord.yy, col = bluetrans) # beta
legend("topleft", legend = c("H_0", "H_a"), col = c("red", "blue"), lty = rep(1,2))
Solution of the Quiz: Is the hypothesis \(H: \mu = 5\) a simple or composite hypothesis? (\(X_1, X_2, ..., X_n\) is a random sample from a normal distribution with mean \(\mu\) and unknown variance \(\sigma^2\))
Solution: The p.d.f of a normal random variable is: \[f(x) = \frac{1}{\sigma \sqrt{2\pi}} exp\Bigg[-\frac{(x-\mu)^2}{2\sigma^2}\Bigg]\] for \(-\infty < x < \infty\), \(-\infty < \mu < \infty\), and \(\sigma > 0\). Under the hypothesis \(H: \mu = 12\), the p.d.f of a normal random variable is:
\[f(x) = \frac{1}{\sigma \sqrt{2\pi}} exp\Bigg[-\frac{(x-5)^2}{2\sigma^2}\Bigg]\] for \(-\infty < x < \infty\), and \(\sigma>0\). In this case, the mean parameter \(\mu = 12\) is uniquely specified in the p.d.f., but the variance \(\sigma^2\) is not. Therefore, the hypothesis \(H:\mu=5\) is a composite hypothesis.