par(mfrow = c(1,3))
dice300 <- sample(6,size = 300,replace = TRUE)
hist(dice300); abline(h = 50, col = "red")
dice3000 <- sample(6,size = 3000,replace = TRUE)
hist(dice3000); abline(h = 500, col = "red")
dice30000 <- sample(6,size = 30000,replace = TRUE)
hist(dice30000); abline(h = 5000, col = "red")
par(mfrow = c(1,1))
table(dice300)
## dice300
## 1 2 3 4 5 6
## 43 51 64 46 46 50
table(dice30000)
## dice30000
## 1 2 3 4 5 6
## 5017 5015 5058 4975 5005 4930
# random number
sample(30:70, size = 27, replace = TRUE) #choose 27 random numbers from 30 to 70
## [1] 35 39 49 33 50 70 47 42 64 62 61 48 33 70 42 36 57 59 67 31 34 52 46
## [24] 31 39 57 38
# flip coin
sample(c("H", "T"), size = 100, replace = TRUE) #to flip a fair coin 100 times
## [1] "T" "T" "H" "H" "H" "H" "T" "H" "T" "H" "H" "H" "H" "H" "H" "H" "H"
## [18] "T" "H" "T" "T" "H" "T" "H" "H" "H" "H" "H" "T" "H" "T" "T" "H" "T"
## [35] "H" "T" "T" "H" "H" "H" "H" "T" "H" "H" "H" "T" "H" "T" "H" "H" "H"
## [52] "H" "H" "T" "H" "H" "H" "H" "H" "T" "H" "H" "H" "H" "H" "T" "T" "T"
## [69] "H" "H" "T" "H" "T" "T" "H" "T" "T" "T" "H" "H" "H" "T" "H" "T" "H"
## [86] "H" "H" "T" "H" "H" "H" "T" "H" "H" "T" "T" "T" "T" "T" "H"
expmf <- dbinom(0:3, 3, 0.5)
excmf <- pbinom(0:3, 3, 0.5)
par(mfrow = c(1,2))
plot(0:3, expmf, type = "p") #point graph
plot(0:3, excmf, type = "p")
par(mfrow = c(1,1))
rbinom(15, 3, 0.5)
## [1] 2 2 1 2 2 0 0 1 1 1 2 3 1 1 2
qbinom(0.25, 3, 0.5)
## [1] 1
qbinom(0.25, 3, 0.5, lower.tail = FALSE)
## [1] 2
The DNA sequence of an organism consists of a long sequence of nucleotides adenine, guanine, cytosine, and thymine. It is known that in a very long sequence of 100 nucleotides, 22 nucleotides are guanine. In a sequence of 10 nucleotides, what is the probability that
# a
dbinom(3, 10, 0.22)
## [1] 0.2244458
# b
pbinom(3, 10, 0.22, lower.tail = FALSE)
## [1] 0.1586739
1 - pbinom(3, 10, 0.22)
## [1] 0.1586739
par(mfrow = c(1,2))
plot(0:10, dnbinom(0:10, 3, 0.6), xlab = 'x', ylab = 'f(x)')
plot(0:10, pnbinom(0:10, 3, 0.6), xlab = 'x', ylab = 'F(x)')
par(mfrow = c(1,1))
par(mfrow = c(1,2))
plot(0:10, dgeom(0:10, 0.6), xlab = 'x', ylab = 'f(x)')
plot(0:10, pgeom(0:10, 0.6), xlab = 'x', ylab = 'F(x)')
par(mfrow = c(1,2))
A scientist is trying to produce a mutation in the genes of Drosophila by treating them with ionizing radiations such as \(\alpha -, \beta -, \gamma -\) or X-ray, which induces small deletions in the DNA of the chromosome. Assume that the probability of producing mutation at each attempt is 0.10.
# a
p <- 0.10
x <- 2
dgeom(x,p)
## [1] 0.081
# b
ex <- 1/0.10
ex
## [1] 10
par(mfrow = c(1,3))
plot(0:10, dpois(0:10, 0.5), xlab = "", ylab = "Prob", type = "p", main = "Lambda 0.5", ylim = c(0, 0.6))
plot(0:10, dpois(0:10, 2), xlab = "", ylab = "Prob", type = "p", main = "Lambda 2", ylim = c(0, 0.6))
plot(0:10, dpois(0:10, 5), xlab = "", ylab = "Prob", type = "p", main = "Lambda 5", ylim = c(0, 0.6))
par(mfrow = c(1,1))
An organism has two genes that may or may not express in a certain period of time. The number of genes expressing in the organism in an interval of time is distributed as Poisson variates with mean 1.5. Find the probability that in an interval
# a
lambda1 <- 1.5
x <- 0
dpois(x,lambda1)
## [1] 0.2231302
# b
x <- 2
dpois(x,lambda1)
## [1] 0.2510214
ex1 <- seq(-4, 4, length = 400)
unifex1 <- dunif(ex1, 0, 2)
plot(ex1, unifex1, type = "l")
The hydrogen bonds linking base pairs together are relatively weak During DNA replication, two strands in the DNA separate along this line of weakness. This mode of replication, in which each replicated double helix contains one original (parental) and one newly synthesized strand (daughter), is called a semiconservative replication. Suppose in an organism, semiconservative replication occurs every half hour. A scientist decides to observe the semiconservative replication randomly. What is the probability that the scientist will have to wait at least 20 minutes to observe the semiconservative replication?
punif(30, 0, 30) - punif(20, 0, 30)
## [1] 0.3333333
x <- seq(0, 30, length = 100)
plot(x,dgamma(x,shape = 2, scale = 1), type = 'l',xlab = "x", ylab = "f(x)",main = "Gamma pdf's")
lines(x,dgamma(x,shape = 2,scale = 2),lty = 2)
lines(x,dgamma(x,shape = 2,scale = 4),lty = 3)
lines(x,dgamma(x,shape = 2,scale = 8),lty = 4)
legend(x = 10,y = .3,paste("Scale=",c(1,2,4,8)),lty = 1:4)
Translation is the process in which mRNA codon sequences are converted into an amino acid sequence. Suppose that in an organism, on average 30 mRNA are being converted per hour in accordance with a Poisson process. What is the probability that it will take more than five minutes before first two mRNA translate?
pgamma(5,shape = 2, scale = 2, lower.tail = FALSE)
## [1] 0.2872975
A gene mutation is a permanent change in the DNA sequence that makes up a gene. Suppose that in an organism, gene mutation occurs at a mean rate of \(\lambda = 1 /3\) per unit time according to a Poisson process. What is the probability that it will take more than five units of time until the second gene mutation occurs?
pgamma(5, shape = 2, scale = 3, lower.tail = FALSE)
## [1] 0.5036683
ex4 <- seq(0, 4, length = 100)
par(mfrow = c(1,2))
plot(ex4, dexp(ex4,1),xlab = "x", ylab = "f(x)", type = "l", main = "Exponential PDF")
plot(ex4, pexp(ex4,1), xlab = "x", ylab = "F(x)", type = "l", main = "Exponential CDF")
par(mfrow = c(1,1))
Cell division is a process by which a cell (parent cell) divides into two cells (daughter cells). Suppose that in an organism, the cell division occurs according to the Poisson process at a mean rate of two divisions per six minutes. What is the probability that it will take at least five minutes for the first division of the cell to take place?
pexp(5,2/6, lower.tail = FALSE)
## [1] 0.1888756
data <- c(0.11, 0.10, 0.10, 0.16, 0.20, 0.32, 0.01, 0.02, 0.07, 0.05, 0.25, 0.14,
0.11, 0.12, 0.08, 0.13, 0.08, 0.14, 0.09, 0.08)
n <- length(data)
x <- seq(0,1,length = 200)
plot(x, dbeta(x, 2, 10),xlab = "prop. of acidic residues", ylab = "f(x)",
main = "Beta PDF for residue data")
points(x = data, y = rep(0,n))
A recent national study showed that approximately 44.7% of college students have used Wikipedia as a source in at least one of their term papers. Let X equal the number of students in a random sample of size n=31 who have used Wikipedia as a source.
1-1. How is X distributed?
1-2. Sketch the p.m.f. using R.
1-3. Sketch the c.m.f. using R.
1-4. Find the probability that X is equal to 17.
1-5. Find the probability that X is at most 13.
1-6. Find the probability that X is bigger than 11.
1-7. Find the probability that X is at least 15.
1-8. Find the probability that X is between 16 and 19, inclusive.
1-9. Give the mean of X, denoted \(E(X)\)
1-10. Give the variance of X.
1-11. Give the standard deviation of X.
1-12. Find \(E(4X + 51.324)\)
#chap 4-1
#1
print("binomial distribution with n = 31, p = 0.447")
## [1] "binomial distribution with n = 31, p = 0.447"
#2
HWpmf <- dbinom(0:31, 31, 0.447)
plot(0:31,HWpmf,type = 'p')
#3
HWcmf <- pbinom(0:31,31,0.447)
plot(0:31,HWcmf,type = 'p')
#4
dbinom(17,31,0.447)
## [1] 0.07532248
#5
pbinom(13,31,0.447)
## [1] 0.451357
#6
pbinom(11,31,0.447,lower.tail = FALSE)
## [1] 0.8020339
#7
pbinom(14,31,0.447,lower.tail = FALSE)
## [1] 0.406024
#8
pb <- dbinom(16:19,31,0.447)
pX <- sum(pb)
pb; pX
## [1] 0.10560875 0.07532248 0.04735464 0.02618995
## [1] 0.2544758
#9
L <- c(0:31)
A <- dbinom(0:31,31,0.447)
Ex <- sum(L*A)
Ex
## [1] 13.857
#10
Varx <- sum(L^2*A) - Ex^2
Varx
## [1] 7.662921
#11
SD <- Varx^(1/2)
SD
## [1] 2.768198
#12
Ex2 <- sum((4*L + 51.32)*A)
Ex2
## [1] 106.748
A DNA sequence of organism A has 100 base pairs(bp), and during DNA replication, they have 50% of mutation probabilities uniformly. And organism B has 10000 bps and have 0.5% of mutation probabilities.
#1
A <- dbinom(0:100,100,0.5)
plot(0:100,A,type = 'p',xlim = range(0:100),ylim = range(seq(0,0.1,0.001)))
#0~100, n=100, p=0.5 binomial distribution plot
B <- dbinom(0:10000,10000,0.005)
plot(0:10000,B,type = 'p',xlim = range(0:100),ylim = range(seq(0,0.1,0.001)))
#binomial distribution plot where n=10000, p=0.005, from 0~100
A40 <- pbinom(39,100,0.5)
B40 <- pbinom(39,10000,0.005)
#2
Lambda1 <- 100*0.5 #lambda = np
Lambda2 <- 10000*0.005
Po40 <- ppois(39,Lambda1)
Po40
## [1] 0.06457037
#3
Po <- dpois(0:100,Lambda1)
plot(0:100,Po,type = 'p',xlim = range(0:100),ylim = range(seq(0,0.1,0.001)))