Probability - Practice

Microarray Quality Control I

In a microarray experiment, a spot is acceptable if it is either perfectly circular or perfectly uniform. In an experiment, a researcher found that there are a total of 20,000 spots. Out of these 20,000 spots, 15,500 are circular, 9,000 are uniform, and 5,000 spots are both circular and uniform. If a spot is chosen randomly, find the probability that the selected spot will be acceptable.

n <- 20000            # n =Total number of spots
c <- 15500            # c = Number of circular spots
u <- 9000             # u = Number of uniform spots
uc <- 5000            # uc = Number of uniform and circular spots
p = c/n + u/n - uc/n
cat("The probability that a randomly chosen spot is acceptable = ", p, fill = T)

## The probability that a randomly chosen spot is acceptable =  0.975

Microarray Quality Control II

In a microarray experiment, a dye R will be labeled into target cDNA with a probability of 0.95. The probability that a signal will be read from a spot (labeled and hybridized) is 0.85. What is the probability that a labeled target cDNA hybridizes on a spot?

# A is the event that the target cDNA will be labeled with the dye R
# B is the event that the target cDNA will hybridize with labeled dye R
PA <- 0.95
PAintersectionB <- 0.85
PBgivenA <- PAintersectionB/PA
cat("Probability that the labeled target cDNA hybridizes on a spot = ", PBgivenA, fill = T)

## Probability that the labeled target cDNA hybridizes on a spot =  0.8947368

Microarray Quality Control III

In a microarray experiment, the probability that a target cDNA is dyed with dye G is 0.9. The probability that the cDNA will hybridize on a given spot is 0.85. The probability of getting a signal from a spot is 0.8. Can hybridization and labeling of the target be considered independent?

# PA is the probability that target is dyed with dye G
# PB is the probability that cDNA hybridizes on a given spot
# PS is the probability that signal is obtained from a spot
PA <- 0.9
PB <- 0.85
PS <- 0.8
PS1 <- PA * PB
if (PS != PS1) {print("Hybridization and labeling of the target are NOT independent")} else {print("Hybridization and labeling of the target are independent")}

## [1] "Hybridization and labeling of the target are NOT independent"

Growth of bacteria

Let the growth rate per minute (per thousand) of the bacteria “TnT” be given by the PDF \(f(x)\) where

\(f(x) = \frac{1}{5}e^{-\frac{x}{5}}\), \(0\leq x<\infty\)

Find the expected growth rate per minute of the bacteria “TnT”.

# EX= Expected Value
integrand <- function(x){(1/5)*exp(-x/5)}
EX <- integrate(integrand, lower = 0, upper = Inf)
print("EX= ")

## [1] "EX= "

EX

## 1 with absolute error < 2e-07

Problem - I

Three microarray experiments E1, E2, and E3 were conducted. After hybridization, an image of the array with hybridized fluorescent dyes (red and green) is acquired in each experiment. The DNA microchip is divided into several grids, and each grid is of one square microcentimeter. In experiment E1, it was observed that on a randomly selected grid there were two red spots, seven green spots, and five black spots. In experiment E2, there were three red spots, seven green spots, and three black spots in a randomly selected grid, while in experiment E3, it was observed that there were four red spots, eight green spots, and three black spot on a randomly selected grid. A researcher chose a grid randomly, and two spots were chosen randomly from the selected grid. The two chosen spots happened to be red and black What is the probability that these two chosen spots came from

1. experiment E3?
1. experiment E1?

library(prob)

## Loading required package: combinat

## 
## Attaching package: 'combinat'

## The following object is masked from 'package:utils':
## 
##     combn

## Loading required package: fAsianOptions

## Loading required package: timeDate

## Loading required package: timeSeries

## Loading required package: fBasics

## Loading required package: fOptions

## 
## Attaching package: 'prob'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

E1 = rep(c("Red", "Green", "Black"), times = c(2, 7, 5))
E2 = rep(c("Red", "Green", "Black"), times = c(3, 7, 3))
E3 = rep(c("Red", "Green", "Black"), times = c(4, 8, 3))

P1 = probspace(urnsamples(E1, size = 2, replace = FALSE, ordered = FALSE))
P2 = probspace(urnsamples(E2, size = 2, replace = FALSE, ordered = FALSE))
P3 = probspace(urnsamples(E3, size = 2, replace = FALSE, ordered = FALSE))

ans <- c(noorder(P1)[4,3], noorder(P3)[4,3])
names(ans) <- c("E1", "E3")
ans

##        E1        E3 
## 0.1098901 0.1142857

Problem - II

Random variable X and Y are adjacent nucleotide sequences observed in certain region of the genome of an organism. By sequencing the genome you get a bivariate probability table as follows.

	A	T	C	G
A	0.2	0.1	0.0	0.1
T	0.0	0.1	0.1	0.1
C	0.1	0.0	0.1	0.0
G	0.0	0.1	0.0	0.0

Find

1. \(P(X = T \; or \; A, Y = C)\)
1. \(P(X = A \; or \; T)\)
1. \(P(Y = T)\)
1. Marginal distributions of \(X\) and \(Y\)
1. Conditional probability of \(X = A\) given \(Y = C\). (\(P(X = A | Y = C)\))

library(prob)

x <- c("A", "T", "C", "G")
y <- c("A", "T", "C", "G")
probTable <- matrix(c(0.2, 0.1, 0, 0.1, 0, rep(0.1,3), 0.1, 0, 0.1, 0, 0, 0.1, 0, 0), 
                    nrow = 4, byrow = TRUE)
rownames(probTable) <- y
colnames(probTable) <- x

# a. P(x = T or A, Y = C)
sum(probTable[3, 1:2])

## [1] 0.1

# b. P(x = A or T)
sum(probTable[, 1:2])

## [1] 0.6

# c. P(Y = T)
sum(probTable[2, ])

## [1] 0.3

# d. Marginal distribution of X and Y
margin_X <- colSums(probTable)
margin_y <- rowSums(probTable)
margin_X; margin_y

##   A   T   C   G 
## 0.3 0.3 0.2 0.2

##   A   T   C   G 
## 0.4 0.3 0.2 0.1

# e. conditional probability of X=A given Y=C, P(X=A | Y=C) = P(X=A, Y=C) / P(Y=C)
probTable[3,1] / margin_y[3]

##   C 
## 0.5