Goodness of fit test
Goodness of Fit Test Fundamentals
- The Goodness of fit test is a statistical hypothesis test to see how well observed data fit a theoretical model.
- The test investigates whether the observed frequency distribution differs significantly from a predicted distribution.
- It is a form of Chi-Squared Test and heavily used in models where you expect a particular distribution.
- The Null Hypothesis (H0) in a goodness of fit test states that there is no significant difference between the observed and expected data - that is, the data follow the specified distribution.
- The Alternative Hypothesis (H1) is that there is a significant difference between the observed and expected data - that is, the data do not follow the specified distribution.
Calculating the Goodness of Fit
- Initiate by noting the observed frequencies for each category or class of the data.
- Determine the expected frequencies for each category following the theoretical distribution under the null hypothesis.
- Calculate the test statistic, which is the sum of the squares of the difference between observed and expected frequencies, divided by the expected frequency, for each category.
- This test statistic follows a Chi-Square distribution.
- The number of degrees of freedom is usually calculated as the number of categories minus one.
Interpreting the Goodness of Fit Test Results
- If the calculated test statistic or Chi-Square value is less than the critical value from the Chi-Square distribution, there is not enough evidence to reject the null hypothesis.
- When the test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant difference between observed and expected data.
- Rejection of the null hypothesis suggests the data does not follow the theoretical distribution.
- Bear in mind, this test only indicates if there’s a significant difference; it doesn’t necessarily specify what the difference is.
- Always interpret the results in the context of the specific problem or dataset you’re working on. A low Chi-square value demonstrates a good fit of your data to the theoretical model, whereas a high value suggests a poor fit.
- Beware of overreliance on tests, including the goodness of fit test. Remember the principle that correlation does not imply causation. A good model fit doesn’t confirm the hypotheses involved but just suggests the model is plausible under the specified assumptions.