Chi Squared Tests: Fitting a theoretical distribution

Chi Squared Tests: Understanding the Basics

Chi Squared tests are used to determine if there’s a significant association between two categorical variables or to determine if data follows a particular theoretical distribution.
The Chi Squared distribution is a family of curves that vary based on the degrees of freedom. As the degrees of freedom increase, the shape of the distribution becomes increasingly symmetrical and resembles a normal distribution.
The Null Hypothesis (H_0) in the context of a Chi Squared test for goodness of fit is that the data follows the specified theoretical distribution.
The Alternative Hypothesis (H_1) is that the data does not follow the specified theoretical distribution.

The Chi Squared Test Statistic (Χ^2) is calculated using the formula:

Χ^2 = Σ [ (O_i - E_i)^2 / E_i ]

where Σ is the summation sign, O_i denotes observed frequencies and E_i denotes expected frequencies.
Expected frequencies, E_i, are calculated based on the theoretical distribution and sample data while observed frequencies, O_i, come from the sample data.
For valid results, all expected frequencies should be at least 5.
Degrees of Freedom (df) for a chi-squared test for goodness of fit is calculated as the number of possible outcomes minus one, that is, df = k-1 where k is the number of categories.

The calculated value for the test statistic is compared to the critical value from the Chi Squared distribution table using the appropriate degrees of freedom and significance level.
If the test statistic is greater than the critical value, the Null Hypothesis is rejected.
If the test statistic is less than or equal to the critical value, there is insufficient evidence to reject the Null Hypothesis, therefore we say that the data may follow the theoretical distribution.

One limitation of this test is that the data must be categorical or must be able to be grouped into categories.
It is also based on the assumption of independent random sampling from the population.
Expected frequencies must be at least 5 for each category for the test to be valid. Too many categories with expected frequencies below 5 can distort the test results.
Finally, the Chi Squared Test doesn’t tell you about the nature or strength of the relationship, if a significant association is found.