Statistics [14]: Chi-Squared Test
Published:
Basics of test.
Pearson Chi-Squared Test
Pearson’s test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance.
It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability 1.
To be specific, assume the population follows the following discrete distribution
Now make independent observations, and the frequencies of the values are respectively , then
follows the distribution with degree of freedom approximately. (This can be proved using the central limit theorem, which can be found here.)
If there are parameters that need to be estimated, then
follows the distribution with degree of freedom approximately.
Example 1
In 1910, Rutherford and Geiger observed the number of particles emitted by the radioactive materials. They observed 2608 times in 7.5 s intervals and recorded the number of particles reaching a certain area. Below is the recording, they recorded 10094 particles in total, and is the times that exactly particles are observed.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
57 | 203 | 383 | 525 | 532 | 408 | 273 | 139 | 45 | 27 | 16 | |
54 | 211 | 407 | 525 | 508 | 394 | 254 | 140 | 68 | 29 | 17 |
Test whether the data follows the Poisson distribution.
Solution. Let the number of particles in one observation be the random variable , as there are 2608 observations, so the probability of one particle falls within one observation is . And as there are 10094 particles recorded, so the distribution of is
On the other hand, according to Poisson theorem, , where . Hence,
Therefore, the statistics under the observation would be
Using the lookup table,
Therefore, the hypothesis that the data follows the Poisson distribution can be accepted.
Independence Test
can also be used to test the correlations between different properties. Suppose and have the following relations.
\ | SUM | ||||||
---|---|---|---|---|---|---|---|
SUM |
Then,
follows distribution with degrees of freedom.
Example 2
Test the independence of income with respect to the number of kids in a family.
kids \ income | 0-1 | 1-2 | 2-3 | 3 | SUM |
---|---|---|---|---|---|
0 | 2161 | 3577 | 2184 | 1636 | 9558 |
1 | 2755 | 5081 | 2222 | 1052 | 11110 |
2 | 936 | 1753 | 640 | 306 | 3635 |
3 | 225 | 419 | 96 | 38 | 778 |
4 | 39 | 98 | 31 | 14 | 182 |
SUM | 6116 | 10928 | 5173 | 3046 | 25263 |
Solution.
As , hence, value is far smaller than 0.001, thus, the income of a family is strongly related with the number of kids.
Table of Contents
- Probability vs Statistics
- Shakespear’s New Poem
- Some Common Discrete Distributions
- Some Common Continuous Distributions
- Statistical Quantities
- Order Statistics
- Multivariate Normal Distributions
- Conditional Distributions and Expectation
- Problem Set [01] - Probabilities
- Parameter Point Estimation
- Evaluation of Point Estimation
- Parameter Interval Estimation
- Problem Set [02] - Parameter Estimation
- Parameter Hypothesis Test
- t Test
- Chi-Squared Test
- Analysis of Variance
- Summary of Statistical Tests
- Python [01] - Data Representation
- Python [02] - t Test & F Test
- Python [03] - Chi-Squared Test
- Experimental Design
- Monte Carlo
- Variance Reducing Techniques
- From Uniform to General Distributions
- Problem Set [03] - Monte Carlo
- Unitary Regression Model
- Multiple Regression Model
- Factor and Principle Component Analysis
- Clustering Analysis
- Summary
Comments