Statistics [14]: Chi-Squared Test

3 minute read

Published:

Basics of test.


Pearson Chi-Squared Test

Pearson’s test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance.

It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability 1.

To be specific, assume the population follows the following discrete distribution

Now make independent observations, and the frequencies of the values are respectively , then

follows the distribution with degree of freedom approximately. (This can be proved using the central limit theorem, which can be found here.)

If there are parameters that need to be estimated, then

follows the distribution with degree of freedom approximately.

Example 1

In 1910, Rutherford and Geiger observed the number of particles emitted by the radioactive materials. They observed 2608 times in 7.5 s intervals and recorded the number of particles reaching a certain area. Below is the recording, they recorded 10094 particles in total, and is the times that exactly particles are observed.

0123456789
57203383525532408273139452716
54211407525508394254140682917

Test whether the data follows the Poisson distribution.

Solution. Let the number of particles in one observation be the random variable , as there are 2608 observations, so the probability of one particle falls within one observation is . And as there are 10094 particles recorded, so the distribution of is

On the other hand, according to Poisson theorem, , where . Hence,

Therefore, the statistics under the observation would be

Using the lookup table,

Therefore, the hypothesis that the data follows the Poisson distribution can be accepted.


Independence Test

can also be used to test the correlations between different properties. Suppose and have the following relations.

\ SUM
SUM

Then,

follows distribution with degrees of freedom.

Example 2

Test the independence of income with respect to the number of kids in a family.

kids \ income0-11-22-33SUM
021613577218416369558
1275550812222105211110
293617536403063635
32254199638778
439983114182
SUM6116109285173304625263

Solution.

As , hence, value is far smaller than 0.001, thus, the income of a family is strongly related with the number of kids.


Table of Contents

Comments