Two-way ANOVA

Two-way ANOVA

In my previous article I discussed about One-way ANOVA Today we will be taking a ride of 2-Way ANOVA with an example using R.

Here the usual Assumptions of ANOVA will hold true for 2-way ANOVA also. So, let us dive directly into the model creation part.

Model:

We consider a factor A having p Levels A1, ………, Ap and another factor B with q levels B1, B2, ………, Bq. Corresponding to ith level of A and jth level of B there are “m” observations
i, j

Two-way fixed effect model after Re parametrization is given by:


Where, i = 1(1)p, j = 1(1)q, k = 1(1)m

yijk = kth observation corresponding to ith level of A and jth level of B.

µ= general effect
αi = additional effect due to ith level of A

βj = additional effect due to jth level of B

γij = interaction effect due to ith level of A and jth level of B:

If there is no interaction effect between the 2 factors A & B then we consider γij =0 in the model.

eijk = error in the model.

Assumptions:


In case of 2-way ANOVA we can derive the following relationship:

TSS = SSA + SSB + SS(AB) + SSE

Where TSS is Total Sum of Squares

SSA is the sum of squares due to factor A

SSB is the sum of squares due to factor B

SS(AB) is the sum of squares due to interaction effect of factor A and B

The above Model is a 2-way fixed effects Model (Levels are fixed). Similarly, we can also create 2-ways Random effects
model (where the levels of a factor are randomly selected) and Mixed effects model where the few levels of a factor are randomly selected & few are fixed)

Hypothesis

Let us not go deep into Mathematical derivations. In case of 2-way ANOVA we want to test the following Hypotheses:

(i) H01: αi = 0
i vs H11: at least one inequality in H01

The factor A has no effect if we fail to reject Null Hypothesis.

(ii) H02: βj = 0 j vs H12: at least one inequality in H02

Factor B has no effect if we fail to reject Null Hypothesis.

(iii) H03: γij = 0 i, j vs H13: at least one inequality in H03

The interaction of A and B has no effect if we fail to reject Null Hypothesis.

Now lets us take a Classic Example:

The Effect of Vitamin C on Tooth Growth in Guinea Pigs

Description

Here, we will be using built-in R data set named ToothGrowth. It contains data from a study evaluating the effect of vitamin C on tooth growth in Guinea pigs. The experiment has been performed on 60 pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Question: We want to know if tooth length depends on supp and dose.

R Code:

# importing data & Storing the data in a variable

my_data = ToothGrowth

head(my_data)

Output:

len supp dose

 4.2 VC 0.5

11.5 VC 0.5

 7.3 VC 0.5

 5.8 VC 0.5

5 6.4 VC 0.5

6 10.0 VC 0.5

Note:

Here we have 2 factors:

(i) Factor 1: The Supplement (supp) with 2 Levels VC (ascorbic acid) and OJ (orange juice)

(ii) Factor 2: The Dosage (dose) with 3 levels 0.5, 1 & 2 mg/day

(iii) Interaction effect of factor 1 & 2.

The Length of the Tooth (len) is affected by the 2 factors

# Check the structure of the data

str(my_data)

Output:

'data.frame':    60 obs. of  3 variables:
 $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
 $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...

$ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Note:

From the above output, we can notice that R considers “dose” as a numeric variable. We’ll convert it as a factor variable (i.e., grouping variable) as follows:

# Convert dose as a factor and recode the levels as “0.5”, “1”, “2”

my_data$dose <- factor(my_data$dose, levels = c(0.5, 1, 2), labels = c(“0.5”, “1”, “2”))

str(my_data)

Output:

'data.frame':    60 obs. of  3 variables:
 $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
 $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
 $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

Note:

Now we can proceed with the analysis in two ways:

  1. Assuming that the two factor variables are independent i.e. there is no interaction effect
  2. Proceeding with the assumption that the two factor variables are not independent.

# Two factor variables are independent (Interaction effect is absent):

aov1 <- aov(len ~ supp + dose, data = my_data)

summary(aov1)

Df Sum Sq Mean Sq F value Pr(>F)

supp 1 205.4 205.4 14.02 0.000429 ***

dose 2 2426.4 1213.2 82.81 < 2e-16 ***

Residuals 56 820.4 14.7


Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

This model is called Additive model.

Interpret the results

From the ANOVA results, you can conclude the following, based on the p-values and at 5% level of significance:

The p-value of supp is 0.000429 (<0.05) i.e. significant, which indicates that the levels of supp are associated with significant different tooth length.

The p-value of dose is < 2e-16 (<0.05) i.e. significant, which indicates that the levels of dose are associated with significant different tooth length.

The factor “dose” is the more significant factor than supp. These results would lead us to believe that changing delivery methods (supp) or the dose of vitamin C, will impact significantly the mean tooth length.

# Two factor variables are independent (Interaction effect is present):

# These two calls are equivalent

aov2 <- aov(len ~ supp * dose, data = my_data)

aov2 <- aov(len ~ supp + dose + supp:dose, data = my_data)

summary(aov2)

Df Sum Sq Mean Sq F value Pr(>F)

supp 1 205.4 205.4 15.572 0.000231 ***

dose 2 2426.4 1213.2 92.000 < 2e-16 ***

supp:dose 2 108.3 54.2 4.107 0.021860 *

Residuals 54 712.1 13.2


Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Interpret the results

From the ANOVA results, you can conclude the following, based on the p-values and at 5% level of significance:

  • The p-value of supp is 0.000231 (<0.05) i.e. significant, which indicates that the levels of supp are associated with significant different tooth length.
  • The p-value of dose is < 2e-16 (<0.05) i.e. significant, which indicates that the levels of dose are associated with significant different tooth length.
  • The p-value for the interaction between supp*dose is 0.02 (<0.05) i.e. significant, which indicates that the relationships between dose and tooth length depends on the supp method.

The checking of Basic Assumptions of ANOVA was discussed in the previous article which has to be followed for better accuracy.


© Taranga Mukherjee

Facing problems in statistics, Actuarial Science or Data science? Don't worry.

Contact us to get High quality soltions.

Mathematica-City

Mathematica-City

Mathematica-city is an online Education forum for Science students run by Kounteyo, Shreyansh and Souvik. We aim to provide articles related to Actuarial Science, Data Science, Statistics, Mathematics and their applications using different Statistical Software. Feel free to reach out to us for any kind of discussion on any of the related topics,

Leave a Reply

Your email address will not be published. Required fields are marked *