Goodness of fit test – A Level Further Mathematics OCR Revision

Goodness of Fit Test Fundamentals

The Goodness of fit test is a statistical hypothesis test to see how well observed data fit a theoretical model.
The test investigates whether the observed frequency distribution differs significantly from a predicted distribution.
It is a form of Chi-Squared Test and heavily used in models where you expect a particular distribution.
The Null Hypothesis (H0) in a goodness of fit test states that there is no significant difference between the observed and expected data - that is, the data follow the specified distribution.
The Alternative Hypothesis (H1) is that there is a significant difference between the observed and expected data - that is, the data do not follow the specified distribution.

Initiate by noting the observed frequencies for each category or class of the data.
Determine the expected frequencies for each category following the theoretical distribution under the null hypothesis.
Calculate the test statistic, which is the sum of the squares of the difference between observed and expected frequencies, divided by the expected frequency, for each category.
This test statistic follows a Chi-Square distribution.
The number of degrees of freedom is usually calculated as the number of categories minus one.

If the calculated test statistic or Chi-Square value is less than the critical value from the Chi-Square distribution, there is not enough evidence to reject the null hypothesis.
When the test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant difference between observed and expected data.
Rejection of the null hypothesis suggests the data does not follow the theoretical distribution.
Bear in mind, this test only indicates if there’s a significant difference; it doesn’t necessarily specify what the difference is.
Always interpret the results in the context of the specific problem or dataset you’re working on. A low Chi-square value demonstrates a good fit of your data to the theoretical model, whereas a high value suggests a poor fit.
Beware of overreliance on tests, including the goodness of fit test. Remember the principle that correlation does not imply causation. A good model fit doesn’t confirm the hypotheses involved but just suggests the model is plausible under the specified assumptions.