Experimental Design

Completely Randomized Design

In a completely randomized design, there is only one primary factor under consideration in the experiment. The test subjects are assigned to treatment levels of the primary factor at random. ANOVA for One-way Layout in our lecture.

A fast food franchise is test marketing 3 new menu items. To find out if they the same popularity, 18 franchisee restaurants are randomly chosen for participation in the study.
In accordance with the completely randomized design, 6 of the restaurants are randomly chosen to test market the first new menu item, another 6 for the second menu item, and the remaining 6 for the last menu item.
In the table below, there are the sales figures of the 3 new menu items in the 18 restaurants after a week of test marketing. At .05 level of significance, test whether the mean sales volume for the 3 items are equal.

Item1	Item2	Item3
22	52	16
42	33	24
44	8	19
52	47	18
45	43	34
37	32	39

# Concatenate the data rows into a single vector r.
r <- c(22, 52, 16, 42, 33, 24, 44, 8, 19, 52, 47, 18, 45, 43, 34, 37, 32, 39)

# Assign new variables for the treatment levels and number of observations
f <- c("Item1", "Item2", "Item3")
k <- 3
n <- 6

# Create a vector of treatment factors that corresponds to each element of r
# with the gl function
tm <- gl(k, 1, n * k, factor(f))
tm

##  [1] Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2
## [12] Item3 Item1 Item2 Item3 Item1 Item2 Item3
## Levels: Item1 Item2 Item3

# Make another temporary dataframe, 'df', with data.frame function, renaming the
# response and treatment factors as 'Sales' and 'Item' respectively.
df <- data.frame(Sales = r, Item = tm)
head(df)

##   Sales  Item
## 1    22 Item1
## 2    52 Item2
## 3    16 Item3
## 4    42 Item1
## 5    33 Item2
## 6    24 Item3

# apply the function aov to a formula that discribes the response 'Sales' by the
# treatment factor 'Item'.
av <- aov(Sales ~ Item, data = df)

# print out the ANOVA table with the summary function.
summary(av)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Item         2  745.4   372.7   2.541  0.112
## Residuals   15 2200.2   146.7

Randomized Block Design

In a randomized block design, there is only one primary factor under consideration in the experiment. Similar test subjects are grouped into blocks. Each block is tested against all treatment levels of the primary factor at random order. This is intended to eliminate possible influence by other extraneous factors. ANOVA for Randomized Block Design in our lecture.

A fast food franchise is test marketing 3 new menu items. To find out if they have the same popularity, 6 franchisee restaurants are randomly chosen for participation in the study.
In accordance with the randomized block design, each restaurant will be test marketing all 3 new menu items. Furthermore, a restaurant will test market only one menu item per week, and it takes 3 weeks to test market all menu items. The testing order of the menu items for each restaurant is randomly assigned as well.
In the table below, the sales figures of the 3 new menu in a restaurant after a week of test marketing. At .05 level of significance, test whether the mean sales volume for the 3 new menu items are all equal.

Item1	Item2	Item3
31	27	24
31	28	31
45	29	46
21	18	48
42	36	46
32	17	40

# Concatenate the data rows into a single vector r2.
r2 <- c(31, 27, 24, 31, 28, 31, 45, 29, 46, 21, 18, 48, 42, 36, 46, 32, 17, 40)

# Assign new variables for the treatment levels and number of control blocks
f <- c("Item1", "Item2", "Item3")
k <- 3
n <- 6

# Create a vector of treatment factors that corresponds to the each element in
# r2 with the gl function
tm <- gl(k, 1, n * k, factor(f))
tm

##  [1] Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2
## [12] Item3 Item1 Item2 Item3 Item1 Item2 Item3
## Levels: Item1 Item2 Item3

# Similarly, create a vector of blocking factors for each element in the response data r2.
blk <- gl(n, k, k*n)
blk

##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
## Levels: 1 2 3 4 5 6

# Make another temporary dataframe, 'df', with data.frame function, renaming the
# response, treatment factors and block control as 'Sales', 'Item' and 'Block'
# respectively.
df <- data.frame(Sales = r2, Item = tm, Block = blk)
head(df)

##   Sales  Item Block
## 1    31 Item1     1
## 2    27 Item2     1
## 3    24 Item3     1
## 4    31 Item1     2
## 5    28 Item2     2
## 6    31 Item3     2

# Apply the function aov to a formula that describes the response 'Sales' by
# both treatment factor 'Item' and the block control 'Block'. Print out the
# ANOVA table with summary function.
av2 <- aov(Sales ~ Item + Block, data = df)
summary(av2)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Item         2  538.8  269.39   4.959 0.0319 *
## Block        5  559.8  111.96   2.061 0.1547  
## Residuals   10  543.2   54.32                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-way Layout Design

In a Two-way Layout Design, there are more than one factors under consideration in the experiment. The test subjects are assigned to treatment levels of every factor combinations at random. In this tutorial, we will consider only a special case, where all groups occur and where there is the same number of observations in each group.

A fast food franchise is test marketing 3 new menu items in both East and West Coasts of continental United States. To find out if they the same popularity, 12 franchisee restaurants from each Coast are randomly chosen for participation in the study.
In accordance with the factorial design, within the 12 restaurants from East Coast, 4 are randomly chosen to test market the first new menu item, another 4 for the second menu item, and the remaining 4 for the last menu item. The 12 restaurants from the West Coast are arranged likewise.
In the table below, the sales figures of the 3 new menu items after a week of test marketing. Each row in the upper table represents the sales figures of 3 different East Coast restaurants. The lower half represents West Coast restaurants. At .05 level of significance, test whether the mean sales volume for the new menu items are all equal. Decide also whether the mean sales volume of the two coastal regions differs.

Item1	Item2	Item3
25	39	36
36	42	24
31	39	28
26	35	29
51	43	42
47	39	36
47	53	32
52	46	33

# Concatenate the data rows into a single vector r3.
r3 <- c(25, 39, 36, 36, 42, 24, 31, 39, 28, 26, 35, 29, 51, 43, 42, 47, 39, 36, 47, 53, 32, 52, 46, 33)

# Assign new variables for the treatment levels and number of observations.
f1 <- c("Item1", "Item2", "Item3")
f2 <- c("East", "West")
k1 <- length(f1)
k2 <- length(f2)
n <- 4

# Create a vector that corresponds to the first treatment level of the response
# data r3 in step 2 element-by-element with the gl function.
tm1 <- gl(k1, 1, n * k1 * k2, factor(f1))
tm1

##  [1] Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2
## [12] Item3 Item1 Item2 Item3 Item1 Item2 Item3 Item1 Item2 Item3 Item1
## [23] Item2 Item3
## Levels: Item1 Item2 Item3

# Similarly, create a vector that corresponds to the second treatment of the
# response data r3 in step 3.
tm2 <- gl(k2, n * k1, n * k1 * k2, factor(f2))
tm2

##  [1] East East East East East East East East East East East East West West
## [15] West West West West West West West West West West
## Levels: East West

# Apply the function aov to a formula that describes the response r3 by the two
# treatment factors tm1 and tm2 with interaction. Print out the ANOVA table with
# summary function.
av3 <- aov(r3 ~ tm1 * tm2)
summary(av3)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## tm1          2  385.1   192.5   9.554  0.00149 ** 
## tm2          1  715.0   715.0  35.481 1.23e-05 ***
## tm1:tm2      2  234.1   117.0   5.808  0.01132 *  
## Residuals   18  362.8    20.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Problem - I

Using ‘coagulation’ dataset in ‘faraway’ packages, answer the following questions

Generate ANOVA table and give conclusion
Test the pairs of means to find which ones are different and by how much from the others.
Try this estimation with Tukey multiple comparison test
Estimate proper sample size to reduce confidence interval in 75%.

library(faraway)
data(coagulation)

# a
ANOVA <- aov(coag~diet,data = coagulation)
summary(ANOVA)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## diet         3    228    76.0   13.57 4.66e-05 ***
## Residuals   20    112     5.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 1-b
df.fl <- summary(ANOVA)[[1]][1,1]
df.R <- summary(ANOVA)[[1]][2,1]
MSE <- summary(ANOVA)[[1]][2,3]
Alpha <- 0.05
t.value <- qt(1 - Alpha/2,df.R,lower.tail = TRUE)

factors <- c('A','B','C','D')
rowID <- c('B-A','C-A','D-A','C-B','D-B','D-C')
colID <- c('diff','lwr','upr')
ttable <- matrix(nrow = 6,ncol = 3,dimnames = list(rowID,colID))

temp = 0
for (i in 1:3)
{
  for (j in i:4)
  {
    if (i != j)
    {
      temp <- temp + 1
      ttable[temp,1] = mean(coagulation$coag[which(coagulation$diet == toString(factors[j]))]) -
        mean(coagulation$coag[which(coagulation$diet == toString(factors[i]))])
      ttable[temp,2] = ttable[temp,1] - 
        t.value*sqrt(MSE*(1/length(which(coagulation$diet == toString(factors[j]))) +
                            1/length(which(coagulation$diet == toString(factors[i])))))
      ttable[temp,3] = ttable[temp,1] + 
        t.value*sqrt(MSE*(1/length(which(coagulation$diet == toString(factors[j]))) +
                            1/length(which(coagulation$diet == toString(factors[i])))))
    }
  }
}

# 1-c
TukeyHSD(ANOVA, conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = coag ~ diet, data = coagulation)
## 
## $diet
##     diff         lwr       upr     p adj
## B-A    5   0.7245544  9.275446 0.0183283
## C-A    7   2.7245544 11.275446 0.0009577
## D-A    0  -4.0560438  4.056044 1.0000000
## C-B    2  -1.8240748  5.824075 0.4766005
## D-B   -5  -8.5770944 -1.422906 0.0044114
## D-C   -7 -10.5770944 -3.422906 0.0001268

# 1-d
CI.table <- matrix(nrow = 6,ncol = 1,dimnames = list(rowID,c('Confidence interval')))
CI.table.re <- matrix(nrow = 6,ncol = 1,dimnames = list(rowID,c('Confidence interval')))
for (i in 1:6)
{
  CI.table[i,1] <- TukeyHSD(ANOVA,conf.level = 0.95)$diet[i,3] -
    TukeyHSD(ANOVA,conf.level = 0.95)$diet[i,2]
  CI.table.re[i,1] <- 0.75*CI.table[i,1]
}

q.value <- qtukey(1 - Alpha,df.fl,df.R)
samsize.table <- matrix(nrow = 6,ncol = 1 ,dimnames = list(rowID,c('Sample size')))
for (i in 1:6)
{
  samsize.table[i,1] <- (64*sqrt(MSE)*q.value^2)/(9*CI.table.re[i,1]^2)
}

Problem - II

Using ‘rats’ dataset in ‘faraway’ packages, answer the following questions

Generate ANOVA table and give conclusion for factors and interaction
Try this estimation with Tukey multiple comparison test
Find which levels of ‘poison’ and ‘treatment’ are significantly different in pairwise difference test

# a
table(rats$poison, rats$treat)

##      
##       A B C D
##   I   4 4 4 4
##   II  4 4 4 4
##   III 4 4 4 4

ratAnova = aov(rats$time~rats$poison*rats$treat, rats)
summary(ratAnova)

##                        Df Sum Sq Mean Sq F value   Pr(>F)    
## rats$poison             2 1.0330  0.5165  23.222 3.33e-07 ***
## rats$treat              3 0.9212  0.3071  13.806 3.78e-06 ***
## rats$poison:rats$treat  6 0.2501  0.0417   1.874    0.112    
## Residuals              36 0.8007  0.0222                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# b
TukeyHSD(ratAnova, which = "rats$poison")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = rats$time ~ rats$poison * rats$treat, data = rats)
## 
## $`rats$poison`
##             diff        lwr         upr     p adj
## II-I   -0.073125 -0.2020091  0.05575913 0.3583151
## III-I  -0.341250 -0.4701341 -0.21236587 0.0000005
## III-II -0.268125 -0.3970091 -0.13924087 0.0000339

TukeyHSD(ratAnova, which = "rats$treat")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = rats$time ~ rats$poison * rats$treat, data = rats)
## 
## $`rats$treat`
##            diff         lwr         upr     p adj
## B-A  0.36250000  0.19852116  0.52647884 0.0000047
## C-A  0.07833333 -0.08564550  0.24231217 0.5772283
## D-A  0.22000000  0.05602116  0.38397884 0.0048556
## C-B -0.28416667 -0.44814550 -0.12018783 0.0002333
## D-B -0.14250000 -0.30647884  0.02147884 0.1077087
## D-C  0.14166667 -0.02231217  0.30564550 0.1107678

# c
interaction.plot(rats$poison,rats$treat,rats$time, col = c("green", "red", "yellow", "cyan"))

interaction.plot(rats$treat,rats$poison,rats$time)

Experimental Design - Practice

Bis335

Completely Randomized Design

Randomized Block Design

Two-way Layout Design

Problem - I

Problem - II