# Two-way ANOVA

In my previous article I discussed about One-way ANOVA Today we will be taking a ride of 2-Way ANOVA with an example using R.

Here the usual Assumptions of ANOVA will hold true for 2-way ANOVA also. So, let us dive directly into the model creation part.

**Model:
**

We consider a factor A having p Levels A_{1}, ………, Ap and another factor B with q levels B_{1}, B_{2}, ………, B_{q}. Corresponding to *i*^{th} level of A and *j*^{th} level of B there are “m” observations ∀

*i, j*

Two-way fixed effect model after Re parametrization is given by:

Where, i = 1(1)p, j = 1(1)q, k = 1(1)m

*y _{ijk}* = k

^{th}observation corresponding to i

^{th}level of A and j

^{th}level of B.

µ= general effect

α_{i} = additional effect due to i^{th} level of A

β_{j} = additional effect due to j^{th} level of B

γ_{ij} = interaction effect due to ith level of A and jth level of B:

If there is no interaction effect between the 2 factors A & B then we consider γ_{ij} =0 in the model.

e_{ijk} = error in the model.

**Assumptions:
**

In case of 2-way ANOVA we can derive the following relationship:

**TSS = SSA + SSB + SS(AB) + SSE
**

Where TSS is Total Sum of Squares

SSA is the sum of squares due to factor A

SSB is the sum of squares due to factor B

SS(AB) is the sum of squares due to interaction effect of factor A and B

The above Model is a 2-way ** fixed effects** Model (Levels are fixed). Similarly, we can also create 2-ways

**model (where the levels of a factor are randomly selected) and Mixed effects model where the few levels of a factor are randomly selected & few are fixed)**

*Random effects***Hypothesis
**

Let us not go deep into Mathematical derivations. In case of 2-way ANOVA we want to test the following Hypotheses:

(i) H_{01}: α_{i} = 0 ∀

*i* vs H_{11}: at least one inequality in H_{01}

The factor A has *no effect* if we fail to reject Null Hypothesis.

(ii) H_{02}: β_{j} = 0 ∀ j vs H_{12}: at least one inequality in H_{02}

Factor B has *no effect* if we fail to reject Null Hypothesis.

(iii) H_{03}: γ_{ij} = 0 ∀ i, j vs H_{13}: at least one inequality in H_{03}

The interaction of A and B has *no effect* if we fail to reject Null Hypothesis.

Now lets us take a Classic Example:

*The Effect of Vitamin C on Tooth Growth in Guinea Pigs
*

Description

Here, we will be using built-in R data set named ** ToothGrowth**. It contains data from a study evaluating the effect of vitamin C on tooth growth in Guinea pigs. The experiment has been performed on 60 pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

*Question: We want to know if tooth length depends on supp and dose.
*

**R Code:
**

# importing data & Storing the data in a variable

my_data = ToothGrowth

head(my_data)

Output:

len supp dose

4.2 VC 0.5

11.5 VC 0.5

7.3 VC 0.5

5.8 VC 0.5

5 6.4 VC 0.5

6 10.0 VC 0.5

**Note:
**

Here we have 2 factors:

(i) **Factor 1: **The **Supplement** (supp) with 2 Levels VC (ascorbic acid) and OJ (orange juice)

(ii) **Factor 2**: The **Dosage** (dose) with 3 levels 0.5, 1 & 2 mg/day

(iii) **Interaction effect** of factor 1 & 2.

The **Length of the Tooth** (len) is affected by the 2 factors

# Check the structure of the data

str(my_data)

Output:

```
'data.frame': 60 obs. of 3 variables:
```

` $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...`

`$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...`

`$ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...`

**Note:
**

From the above output, we can notice that R considers “dose” as a numeric variable. We’ll convert it as a factor variable (i.e., grouping variable) as follows:

# Convert dose as a factor and recode the levels as “0.5”, “1”, “2”

my_data$dose <- factor(my_data$dose, levels = c(0.5, 1, 2), labels = c(“0.5”, “1”, “2”))

str(my_data)

Output:

```
'data.frame': 60 obs. of 3 variables:
```

```
$ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
```

```
$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
```

```
$ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
```

**Note:
**

Now we can proceed with the analysis in two ways:

- Assuming that the two factor variables are
**independent**i.e. there is no interaction effect

- Proceeding with the assumption that the two factor variables are
**not independent**.

# Two factor variables are independent (Interaction effect is absent):

aov1 <- aov(len ~ supp + dose, data = my_data)

summary(aov1)

Df Sum Sq Mean Sq F value Pr(>F)

supp 1 205.4 205.4 14.02 0.000429 ***

dose 2 2426.4 1213.2 82.81 < 2e-16 ***

Residuals 56 820.4 14.7

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

This model is called *Additive model.*

**Interpret the results
**

From the ANOVA results, you can conclude the following, based on the p-values and at 5% level of significance:

The p-value of **supp** is 0.000429 (**<0.05)** i.e. **significant**, which indicates that the levels of supp are associated with significant different tooth length.

The p-value of **dose** is < 2e-16 (**<0.05)** i.e. **significant**, which indicates that the levels of dose are associated with significant different tooth length.

The factor “dose” is the more significant factor than supp. These results would lead us to believe that changing delivery methods (supp) or the dose of vitamin C, will impact significantly the mean tooth length.

# Two factor variables are independent (Interaction effect is present):

# These two calls are equivalent

aov2 <- aov(len ~ supp * dose, data = my_data)

aov2 <- aov(len ~ supp + dose + supp:dose, data = my_data)

summary(aov2)

Df Sum Sq Mean Sq F value Pr(>F)

supp 1 205.4 205.4 15.572 0.000231 ***

dose 2 2426.4 1213.2 92.000 < 2e-16 ***

supp:dose 2 108.3 54.2 4.107 0.021860 *

Residuals 54 712.1 13.2

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

**Interpret the results
**

From the ANOVA results, you can conclude the following, based on the p-values and at 5% level of significance:

- The p-value of
**supp**is 0.000231 (**<0.05)**i.e.**significant**, which indicates that the levels of supp are associated with significant different tooth length.

- The p-value of
**dose**is < 2e-16 (**<0.05)**i.e.**significant**, which indicates that the levels of dose are associated with significant different tooth length.

- The p-value for the
**interaction**between supp*dose is 0.02 (**<0.05)**i.e.**significant**, which indicates that the relationships between dose and tooth length depends on the supp method.

The checking of Basic Assumptions of ANOVA was discussed in the previous article which has to be followed for better accuracy.

Facing problems in statistics, Actuarial Science or Data science? Don't worry.

Contact us to get High quality soltions.