Normal Distribution

  • x : value
  • Mean : mean of the distribution. Must be finite.
  • sd : standard deviation of the distribution. Must be finite.
  • pseudocode
    • dnorm(x, mean, sd)
      • get pdf value at x where X~N(mean,sd)
    • pnorm(x, mean, sd, lower.tail=TRUE)
      • get \(P(X \leq x)\) where X~N(mean,sd) if lower.tail=TRUE, otherwise, P(X>x)
    • qnorm(quantile, mean, sd, lower.tail=TRUE)
      • get x where \(P(X \leq x)\)=quantile if lower.tail=TRUE, otherwise P(X>x)=quantile.
    • rnorm(n, mean, sd)
      • n times observations of X
ex2 <- seq(-4, 4, length = 100)
plot(ex2, dnorm(ex2,0,1),xlab = 'x', ylab = 'f(x)', type = 'l', main = 'Normal PDF')

plot(ex2, pnorm(ex2,0,1), xlab = 'x', ylab = 'F(x)', type = 'l', main = 'Normal CDF')

# Plotting normal distribution
x <- c(321, 275, 345, 347, 297, 309, 312, 371, 330, 295, 
     299, 365, 378, 387, 295, 322, 292, 270, 321, 277)
mean(x)
## [1] 320.4
sd(x)
## [1] 35.16787
par(mfrow = c(1,2))                               # set space of graph 1x2
ex3 <- seq(200,450,by = .25)
plot(ex3,dnorm(ex3,mean(x),sd(x)),type = 'l', 
     xlab = "Frag Size in bp", ylab = "f(x)",  main = "Restriction Fragments PDF")
points(x,rep(0,length(x)))
plot(ex3,pnorm(ex3,mean(x),sd(x)),type = 'l', 
     xlab = "Frag Size in bp", ylab = "Cum Prob", main = "Restriction Fragment CDF")
points(x,rep(0,length(x)))

par(mfrow = c(1,1))

Example of Normal Distribution I

Suppose that in a microarray experiment, a gene is said to be expressed if the expression level is between 500 and 600 units. It is supposed that gene expressions are normally distributed with mean 540 units and variance 252 units. What is the probability that a randomly selected gene from this population will be an expressed gene?

p1 <- pnorm(600,540,25) - pnorm(500,540,25)
p1
## [1] 0.9370032

Example of Normal Distribution II

Let the distribution of the mean intensities of cDNA spots, corresponding to expressed genes, be normal with mean 750 and standard deviation 50. What is the probability that a randomly selected expressed gene has a spot with a mean intensity

pnorm(825, 750, 50)
## [1] 0.9331928
pnorm(650, 750, 50, lower.tail = FALSE)
## [1] 0.9772499

Chi-Square Distribution

  • x : value
  • df : Degree of freedom.
  • ncp : Number of observation.
  • pseudocode
    • dchisq(x, df, ncp)
      • get pdf value at x where X~chisq(df)
    • pchisq(x, df, ncp, lower.tail=TRUE)
      • get \(P(X \leq x)\) where X~chisq(df) if lower.tail=TRUE, otherwise, P(X>x)
    • qchisq(p, df, ncp,lower.tail=TRUE)
      • get x where \(P(X \leq x)\)=p if lower.tail=TRUE, otherwise P(X>x)= p.
    • rchisq(n, df, ncp)
      • n times observations of X
x <- seq(-20,20,by = 0.5) 
y <- dchisq(x,df = 1) 
plot(x, y, xlab = "x", ylab = "f(x)", type = "l", main = "Chi square distribution PDF") 
y <- dchisq(x,df = 5) 
lines(x,y,col = "red")

pchisq(2,df = 10) #cumulative distribution function
## [1] 0.003659847
pchisq(3,df = 10)
## [1] 0.01857594
1 - pchisq(3,df = 10)
## [1] 0.9814241
pchisq(3,df = 20)
## [1] 4.097501e-06
x = c(2,4,5,6)
pchisq(x,df = 20)
## [1] 1.114255e-07 4.649808e-05 2.773521e-04 1.102488e-03
x <- seq(-20,20,by = 0.5) 
y <- dchisq(x,df = 1) 
plot(x, pchisq(x,3), xlab = expression(chi^2), ylab = "F(x)", type = "l", main = "Chi-square CDF")

Student T-Distribution

  • x : value
  • df : Degree of freedom.
  • ncp : Number of observation.
  • pseudocode
    • dt(x, df, ncp)
      • get pdf value at x where X~t(df)
    • pt(x, df, ncp, lower.tail=TRUE)
      • get \(P(X \leq x)\) where X~t(df) if lower.tail=TRUE, otherwise, P(X>x)
    • qt(p, df, ncp,lower.tail=TRUE)
      • get x where \(P(X \leq x)\)=p if lower.tail=TRUE, otherwise P(X>x)= p.
    • rt(n, df, ncp)
      • n times observations of X
x <- seq(-20,20,by = 0.5) 
y <- dt(x,df = 1) 
plot(x, y, xlab = "x", ylab = "f(x)", type = "l", main = "Student t-distribution PDF") 
y <- dt(x,df = 100) 
lines(x,y,col = "red")

# Find the 2.5th and 97.5th percentiles of the Student t distribution with 5 degrees of freedom
qt(c(.025, .975), df = 5)   # 5 degrees of freedom 
## [1] -2.570582  2.570582
# Draw cumulative density function of Student t-distribution on your own. (Use qt() function) 
curve(qt(x, 100), xlab = 'x', ylab = "F(x)", main = "CDF")

# Plot and compare probability density function of standard normal distribution
# and student t-distribution with some values of degree of freedom yourself.
# (dnorm(), dt(), ...)

plot(x, dt(x,df = 100), xlab = "x", ylab = "f(x)", type = "l") 
lines(x, dnorm(x), lty = 2, col = "red", cex = 0.8)

Problem - 1

A research scientist reports that mice will live an average of 40 months when their diets are sharply restricted and then enriched with vitamins and proteins. Assuming that the lifetimes of such mice are normally distributed with a standard deviation of 6.3 months, find the probability that a given mouse will live

  • A. More than 32 months
  • B. Less than 28 months
  • C. Between 37 and 49 months.
#1-A
pnorm(32,40,6.3,lower.tail = FALSE)
## [1] 0.8979294
#1-B
pnorm(27,40,6.3,lower.tail = TRUE)
## [1] 0.01953295
#1-C
pnorm(49,40,6.3) - pnorm(37,40,6.3)
## [1] 0.6064669

Problem - 2

The serum cholesterol level X in 14-year old boys has approximately a normal distribution with mean 170 and standard deviation 30. - A. Find the probability that the serum cholesterol level of a randomly chosen 14-year-old boy exceeds 230. - B. In a middle school, there are 300 14-year-old boys. Find the probability that at least 8 boys have a serum cholesterol level that exceeds 230. Use proper normal approximation technique.

#2-A
P2 <- pnorm(230,170,30,lower.tail = FALSE)

#2-B
Numb <- pnorm(8 - 0.5,300*P2,sqrt(300*P2*(1 - P2)),lower.tail = FALSE)

Problem - 3

The average rate of distilled water usage (liters per day) by a laboratory is known to involve the lognormal distribution with parameters \(\mu\)=5 and \(\sigma\)=2. It is important for planning purposes to get a sense of periods of high usage. - A. Draw the probability density function of water usage rate for given situation. - B. What is the probability that, for any given day, more than 50 liters of water are used? - C. What is the mean of the average water usage per day in liters?

#3-A
LogN <- dlnorm(0:100,5,2)
plot(0:100,LogN,type = 'l')

#3-B
P3 <- plnorm(50,5,2,lower.tail = FALSE)

#3-C
AVG <- exp(5 + (2^2)/2)