Statistics [15]: Analysis of Variance - F test
Published:
In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables. In its simplest form ANOVA gives a statistical test of whether the means of several groups are all equal, and therefore generalizes Student’s two-sample test to more than two groups.
One-Way ANOVA
The one-way ANOVA compares the means between the groups we are interested in and determines whether any of those means are statistically significantly different from each other. Specifically, it tests the null hypothesis:
where .
ANOVA Model
The independent variable (grouping variable) that we are interested is called factor, and each factor may have two or more levels. In the table below, are levels of factor ; is number of independent tests under each level; is the test results.
level \ test number | ||||
---|---|---|---|---|
Assume
Further, let
We have
The hypothesis could be
ANOVA Problem
The hypothesis test problem:
The test statistics:
Variance Analysis
where
Within group sum of square:
Between group sum of square:
For ,
For ,
When hypothesis is true,
Hence,
The hypothesis is thus rejected for large values of .
Example 1
Comparison of pesticides.
87 | 90 | 56 | 55 | 92 | 75 | |
85 | 88 | 62 | 48 | 99 | 72 | |
80 | 87 | 95 | 81 | |||
94 | 91 |
ANOVA
Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|
Between Groups | 3794.500 | 5 | 758.900 | 51.162 | .000 |
Within Groups | 178.000 | 12 | 14.833 | ||
Total | 3972.500 | 17 |
The significance is obvious.
Example 2
Lifetime of lightbulbs.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|
1600 | 1610 | 1650 | 1680 | 1700 | 1700 | 1780 | ||
1500 | 1640 | 1400 | 1700 | 1750 | ||||
1640 | 1550 | 1600 | 1620 | 1640 | 1600 | 1740 | 1800 | |
1510 | 1520 | 1530 | 1570 | 1640 | 1680 |
ANOVA
Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|
Between Groups | 39776.456 | 3 | 13258.819 | 1.638 | .209 |
Within Groups | 178088.929 | 22 | 8089.951 | ||
Total | 217865.385 | 25 |
The significance is not obvious.
Two-Way ANOVA
A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. Use a two-way ANOVA when we want to know how two independent variables, in combination, affect a dependent variable.
ANOVA Model
Assume there are two factors and with levels and respectively. Then, there will be different combinations . Perform independent tests under each combination, tests in total would be necessary.
\ | — | |||
---|---|---|---|---|
Assume
Let
We have
where reflects the total effects of on the experiment results, is called interactive effect and and are called main effects.
Further,
In summary,
and all are independent.
Hypothesis
Variance Analysis
where
Let
Then,
The expectation would be
Summary,
Source | Sum of Square | DOF | Mean Square | value |
---|---|---|---|---|
Example 3
Influence of fertilizer and seeds on the production.
seeds \ production \ fertilizer | ||||
---|---|---|---|---|
173,172 | 174,176 | 177,179 | 172,173 | |
175,173 | 178,177 | 174,175 | 170,171 | |
177,175 | 174,174 | 174,173 | 169,169 |
ANOVA
Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|
Seeds | 8.083 | 2 | 4.042 | 4.409 | .037 |
Fertilizer | 90.833 | 3 | 30.278 | 33.030 | .000 |
Seeds * Fertilizer | 51.917 | 6 | 8.653 | 9.439 | .001 |
Error | 11.000 | 12 | .917 | ||
Total | 161.833 | 23 |
The significance of fertilizer and (seeds * fertilizer) is obvious, however, the significance of seeds is not as significance as fertilizer.
Table of Contents
- Probability vs Statistics
- Shakespear’s New Poem
- Some Common Discrete Distributions
- Some Common Continuous Distributions
- Statistical Quantities
- Order Statistics
- Multivariate Normal Distributions
- Conditional Distributions and Expectation
- Problem Set [01] - Probabilities
- Parameter Point Estimation
- Evaluation of Point Estimation
- Parameter Interval Estimation
- Problem Set [02] - Parameter Estimation
- Parameter Hypothesis Test
- t Test
- Chi-Squared Test
- Analysis of Variance
- Summary of Statistical Tests
- Python [01] - Data Representation
- Python [02] - t Test & F Test
- Python [03] - Chi-Squared Test
- Experimental Design
- Monte Carlo
- Variance Reducing Techniques
- From Uniform to General Distributions
- Problem Set [03] - Monte Carlo
- Unitary Regression Model
- Multiple Regression Model
- Factor and Principle Component Analysis
- Clustering Analysis
- Summary
Comments